After completing this section, you should be able to:
- Calculate the range of a dataset
- Calculate the standard deviation of a dataset
Measures of centrality like the mean can give us only part of the picture that a dataset paints. For example, let’s say you’ve just gotten the results of a standardized test back, and your score was 138. The mean score on the test is 120. So, your score is above average! But how good is it really? If all the scores were between 100 and 140, then you know your score must be among the best. But if the scores ranged from 0 to 200, then maybe 140 is good, but not great (though still above average). Knowing information about how the data are spread out can help us put a particular data value in better context. In this section, we’ll look at two numbers that help us describe the spread in the data: the range and the standard deviation. These numbers are called measures of dispersion.
Our first measure of dispersion is the range, or the difference between the maximum and minimum values in the set. It’s the measure we used in the standardized test example above.
Let’s look at a couple of examples.
Finding the Range
You survey some of your friends to find out how many hours they work each week. Their responses are: 5, 20, 8, 10, 35, 12. What is the range?
The maximum value in the set is 35 and the minimum is 5, so the range is .
Your Turn 8.24
For large datasets, finding the maximum and minimum values can be daunting. There are two ways to do it in a spreadsheet. First, you can ask the spreadsheet program to sort the data from smallest to largest, then find the first and last numbers on the sorted list. The second method uses built-in functions to find the minimum and maximum.
In either method, once you’ve found the maximum and minimum, all you have to do is subtract to find the range.
Finding the Range with Google Sheets
The data in “AvgSAT” contains the average SAT score for students attending every institution of higher learning in the US for which data is available. What is the range of these average SAT scores?
Step 1: To find the maximum, click on an empty cell in the spreadsheet, type “=MAX(”, and then click on the letter that marks the top of the column containing the AvgSAT data. That inserts a reference to the column into our function. Then we close the parentheses and hit the enter key. The formula is replaced with the maximum value in our data: 1566.
Step 2: Using the same process (but with “MIN” instead of “MAX”), we find the minimum value is 785.
Step 3: So, the range is .
Your Turn 8.25
The range is very easy to compute, but it depends only on two of the data values in the entire set. If there happens to be just one unusually high or low data value, then the range might give a distorted measure of dispersion. Our next measure takes every single data value into account, making it more reliable.
The Standard Deviation
The standard deviation is a measure of dispersion that can be interpreted as approximately the average distance of every data value from the mean. (This distance from the mean is the “deviation” in “standard deviation.”)
The standard deviation is computed as follows:
Here, represents each data value, is the mean of the data values, is the number of data values, and the capital sigma () indicates that we take a sum.
To compute the standard deviation using the formula, we follow the steps below:
- Compute the mean of all the data values.
- Subtract the mean from each data value.
- Square those differences.
- Add up the results in step 3.
- Divide the result in step 4 by
- Take the square root of the result in step 5.
Let’s see that process in action.
Computing the Standard Deviation
You surveyed some of your friends to find out how many hours they work each week. Their responses were: 5, 20, 8, 10, 35, 12. What is the standard deviation?
Let’s follow the six steps mentioned previously to compute the standard deviation.
Step 1: Find the mean: .
Step 2: Subtract the mean from each data value. To help keep track, let’s do this in a table. In the first row, we’ll list each of our data values (and we’ll label the row ); in the second, we’ll subtract from each data value.
Step 3: Square the differences. Let’s add a row to our table for those values:
Step 4: Add up those squares: .
Step 5: Divide the sum by . Since we have 6 data values, that gives us .
Step 6: Take the square root of the result: .
Thus, the standard deviation is .
Your Turn 8.26
The computation for the standard deviation is complicated, even for just a small dataset. We’d never want to compute it without technology for a large dataset! Luckily, technology makes this calculation easy.
Finding the Standard Deviation with Google Sheets
The data in “AvgSAT” contains the average SAT score for students attending every institution of higher learning in the US for which data is available. What is the standard deviation of these average SAT scores?
To find the standard deviation, we click in an empty cell in our spreadsheet and then type “=STDEV(”. Next, click on the letter at the top of the column containing our data; this will put a reference to that column into our formula. Then close the parentheses with and hit the enter key. The formula is replaced with the result: 125.517.
Check Your Understanding
Employees at a college help desk track the number of people who request assistance each week, as listed below:
The following are data on the admission rates of the different branch campuses in the University of California system, along with the out-of-state tuition and fee cost.
|Campus||Admission Rate||Cost ($)|
Section 8.4 Exercises
For the following exercises, use the data found in “TNSchools”, which has data on many institutions of higher education in the state of Tennessee. Here are what the columns represent:
|AdmRate||Proportion of applicants that are admitted|
|UGEnr||Number of undergraduate students|
|PTUG||Proportion of undergraduates who attend part-time|
|InState||Tuition and fees for in-state students|
|OutState||Tuition and fees for out-of-state students|
|FacSal||Mean monthly faculty salary|
|Pell||Proportion of students receiving Pell Grants|
|MedDebt||Median student loan debt at degree completion|
|StartAge||Mean age at the time of entry|
|Female||Proportion of students who identify as female|
For the following exercises, use the table below, which gives the final results for the 2021 National Women’s Soccer League season. The columns are standings points (PTS; teams earn three points for a win and one point for a tie), wins (W), losses (L), ties (T), goals scored by that team (GF), and goals scored against that team (GA).
|Portland Thorns FC||44||13||6||5||33||17|
|Chicago Red Stars||38||11||8||5||28||28|
|NJ/NY Gotham FC||35||8||5||11||29||21|
|North Carolina Courage||33||9||9||6||28||23|
|Racing Louisville FC||22||5||12||7||21||40|
|Kansas City Current||16||3||14||7||15||36|