Donna Kirk

Your Turn

8.1

1.

None of the above (there’s no sample being selected here; the entire population is being surveyed)

2.

A stratified sample is chosen so that particular groups in the population are certain to be represented. This is a stratified random sample since it made sure to sample each major.

Stratified random sample (the strata are the different majors)

3.

This is a simple random sample. A simple random sample is chosen in a way that every unit in the population has an equal chance of being selected, and the chances of a unit being selected do not depend on the units already chosen.

Simple random sample

8.2

1.

A categorical frequency distribution is a table with two columns. The first column contains all the categories present in the data, each listed once. The second column contains the frequencies of each category.

Make a table with a header that describes your categories and labels for each row. The second column is labeled “Frequency.” Check your work by making sure the sum of the second column is correct.

Major	Frequency
Biology	6
Education	1
Political science	3
Sociology	2
Undecided	4

Check the sum: 6 + 1 + 3 + 2 + 4 = 16

Major	Frequency
Biology	6
Education	1
Political Science	3
Sociology	2
Undecided	4

8.3

1.

A categorical frequency distribution is a table with two columns. The first column contains all the categories present in the data, each listed once. The second column contains the frequencies of each category.

Make a table with a header that describes your categories and labels for each row. The second column is labeled “Frequency.” Check your work by making sure the sum of the second column is correct.

Number of People in the Residence	Frequency
1	12
2	13
3	8
4	6
5	1

Check the sum: 12 + 13 + 8 + 6 + 1 = 40

Number of people in the residence	Frequency
1	12
2	13
3	8
4	6
5	1

8.4

1.

Step 1: Identify the maximum and minimum values in your bins.

The max: 70

The min: 26

Step 2: Determine the bin widths. Aim for seven or eight bins. You can use a formula to determine the bin width:

$\frac{{{\rm{maximum - minimum}}}}{{\# {\rm{\ of\ bins}}}} = \frac{{70 - 26}}{8} = 5.25$

Step 3: Consider the context of the values. Since these are ages, round this to 5 to get a bin width of 5 years.

Step 4: Create the distribution table by filling in the bins. Notice that the last bin does not follow the pattern since it is made a bit larger to include the last value.

Age Range	Frequency
25 to 29
30 to 34
35 to 39
40 to 44
45 to 49
50 to 54
55 to 59
60 to 64
65 to 70

Step 5: Fill in the frequencies.

Age Range	Frequency
25 to 29	2
30 to 34	6
35 to 39	2
40 to 44	4
45 to 49	1
50 to 54	2
55 to 59	2
60 to 64	0
65 to 70	1

Step 6: Check your work. 2 + 6 + 2 + 4 + 1 + 2 + 2 + 0 + 1 = 20

Age range	Frequency
25-29	2
30-34	6
35-39	2
40-44	4
45-49	1
50-54	2
55-59	2
60-64	0
65-70	1

(Answers may vary depending on bin boundary decisions)

8.5

1.

Major	Frequency	Proportion
Biology	6	37.5%
Education	1	6.3%
Political Science	3	18.8%
Sociology	2	12.5%
Undecided	4	25%

Note that these percentages add up to 100.1%, due to the rounding.

Step 1: To compute a proportion, you need the frequency which you have in the table, and the total number of units that are represented in the data.

Major	Frequency
Biology	6
Education	1
Political Science	3
Sociology	2
Undecided	4

Add the frequencies: 6 + 1 + 3 + 2 + 4 = 16

Step 2: To find the proportions, divide the frequency by the total.

Major	Frequency	Proportion
Biology	6	$\frac{6}{{16}} = 37.5\%$
Education	1	$\frac{1}{{16}} = 6.25\%$
Political Science	3	$\frac{3}{{16}} = 18.75\%$
Sociology	2	$\frac{2}{{16}} = 12.5\%$
Undecided	4	$\frac{4}{{16}} = 25\%$

Step 3: Add up your proportions. You should get 100%.

37.5% + 6.25% + 18.75% + 12.5% + 25% = 100%

8.6

1.

A bar graph plots percentages of different majors. The horizontal axis represents majors. The vertical axis representing percent ranges from 0 percent to 40 percent, in increments of 5 percent. The graph infers the following data. Biology: 37.5 percent. Education: 6.3 percent. Political Science: 18.8 percent. Sociology: 12.5 percent. Undecided: 25 percent.

A bar chart consists of a series of rectangles arranged side-by-side (but not touching). Each rectangle corresponds to one of the categories. All the rectangles have the same width. The height of each rectangle corresponds to either the number of units in the corresponding category or the proportion of the total units that fall into the category.

Step 1: Draw the axes with the origin at the bottom left.

Step 2: Next, place your categories evenly spaced along the bottom of the horizontal axis. The order does not matter. If they have some natural order, maintain that order. Label the horizontal axis.

You will have your five majors, along with the label “Majors.”

Step 3: Decide how to define the height of the rectangles, the frequency itself or the proportion. Mark the vertical axis with units appropriate to your choice.

Use “Percent” for your vertical axis. Label your tick marks every 5% up to a maximum of 40%.

Step 4: Draw in the rectangles. Place a horizontal mark above the labels at the height appropriate for each category.

Use the table to mark the height of your five bars.

Step 5: Draw vertical lines straight down from the edges of your mark to make a rectangle.

Step 6: Build the rest of the rectangles, making sure they all have the same width. If a category has a percentage of 0, leave a space. Do not let the rectangles touch.

8.7

1.

The chart shows the Southeast region has the highest percentage of around 25 percent.

Southeast

2.

Just over 10%

3.

Two regions have bars lower than the 5% line: Outlying Areas and Rocky Mountains.

Outlying Areas and Rocky Mtns.

8.8

1.

A pie chart represents the majors of students in the class. The pie chart infers the following data. Biology: 37.5 percent. Undecided: 25 percent. Political Science: 18.8 percent. Sociology: 12.5 percent. Education: 6.3 percent.

A pie chart consists of a circle divided into wedges, with each edge corresponding to a category. The proportion of the area of the entire circle that each wedge represents corresponds to the proportion of the data in that category. First, enter your data table into a new Google Sheet. Next, click and drag to select the full table, including the header row. Click on the “Insert” menu, then select “Chart.” The result may be a pie chart by default. If it is not, you can change it to a pie chart using the “Chart type” drop-down menu in the Chart editor.

Major	Proportion
Biology	37.5%
Education	6.25%
Political Science	18.75%
Sociology	12.5%
Undecided	25%

Your pie chart may look a bit different than the one in the solution and that’s fine. Just be sure that the relative size of the wedges is the same and that you have labels on your pie “slices.”

8.9

1.

Stem-and-leaf plots are visualization tools that consist of a list of stems on the left and corresponding leaves on the right, separated by a line. The stems are the numbers that make up the data only up to the next-to-the last digit, and the leaves are the final digits. There is one leaf for every data value (which means that leaves may be repeated).

The leaves are the numbers to the right side of the line. There are 24 leaves.

Twenty-four

2.

The three largest data values are 36, 50, and 60 miles.

The three smallest data values are 4, 6, and 7 miles.

The longest commutes are 60, 50, and 36 miles; the shortest are 4, 6, and 7 miles.

3.

Put together the stems (the numbers on the left side of the line) and leaves (the numbers on the right side of the line).

4, 6, 7, 10, 10, 10, 12, 12, 12, 14, 15, 18, 18, 20, 25, 25, 25, 30, 30, 35, 35, 36, 50, 60

8.10

1.

Stem-and-leaf plots are visualization tools that consist of a list of stems on the left and corresponding leaves on the right, separated by a line. The stems are the numbers that make up the data only up to the next-to-the last digit, and the leaves are the final digits. There is one leaf for every data value (which means that leaves may be repeated).

4	7
5	9 7 4
6	9 8 7
7	8 7 5 2 2 1 0
8	9 6 5 4 4 1
9	7 7 6 3 3 1
10	7 6 3 1

4	7
5	9 7 4
6	9 8 7
7	8 7 5 2 2 1 0
8	9 6 5 4 4 1
9	7 7 6 3 3 1
10	7 6 3 1

8.11

1.

Step 1: Add data to bins.

Histograms are easy to make when the stem-and-leaf plot is already done.

4	7
5	9 7 4
6	9 8 7
7	8 7 5 2 2 1 0
8	9 6 5 4 4 1
9	7 7 6 3 3 1
10	7 6 3 1

If using a bin-width of 10, you can compute the frequency by counting the number of leaves associated with the corresponding stem:

Bin	Frequency
40-49	1
50-59	3
60-69	3
70-79	7
80-89	6
90-99	6
100-109	4

Step 2. Create the axes.

On the horizontal axis, start labeling with the lower end of the first bin (in this case, 40), and go up to the higher end of the last bin (110). Mark off the other bin boundaries, making sure they are evenly spaced.

On the vertical axis, start with zero and go up to at least the highest frequency you see in your bins (7 in this example). You could go up to 8 and make labels and tick marks at 0, 2, 4, 6, and 8.

Step 3. Draw in the bars.

Remember that the bars of a histogram touch. The heights are determined by the frequency. The first bar will cover from 40 to 50 at a height of 1. The second bar will cover from 50 to 60 at a height of 3. Continue to draw bars using your bin and frequency table until you draw a bar from 100 to 110 at a height of 4.

To check your work, notice that if you turned your stem-and-leaf plot on its side, it would have the same general shape as your histogram.

A histogram represents wins by MLB teams, 2019 season (good). The horizontal axis representing wins ranges from 40 to 110, in increments of 10. The vertical axis representing frequency ranges from 0 to 8, in increments of 2. The histogram infers the following data. 40 to 50: 1. 50 to 60: 3. 60 to 70: 3. 70 to 80: 7. 80 to 90: 6. 90 to 100: 6. 100 to 110: 4.

8.12

1.

Watch the video in the lesson.

Open a copy of the file from the lesson in Google Sheets.

See the InState tuition costs on the second tab.

Highlight ONLY column B and C (not all three columns).

In the top menu, choose “Insert” and then choose “Chart.”

It likely will suggest “Histogram” in “Chart type.” If not, change it to “Histogram.”

Select “Customize,” then select the “Histogram” options.

Bucket size: You can change “Auto” to anything you like. You will have to select a preset number first, but then you can type in a number such as 2,500.

To add axis labels, select “Chart and axis titles.”

In the drop down, select “Horizontal axis title” and enter “Tuition.”

In the drop down, select “Vertical axis title” and enter “Frequency,”

Once you have created your histogram, you can see that the data are strongly right skewed.

Answers may vary based on bin choices. Here’s the result for bins of width 2,500:

A histogram of in-state tuition costs at US institutions. The horizontal axis representing tuition ranges from 2500 to 75000, in increments of 2500. The vertical axis representing frequency ranges from 0 to 800, in increments of 200. The histogram infers the following data: 2500, 230. 5000, 680. 7500, 480. 10000, 390. 12500, 300. 15000, 390. 17500, 270. 20000, 175. 22500, 150. 25000, 100. 27500, 120. 30000, 150. 32500, 160. 35000, 120. 37500, 100. 40000, 90. 42500, 80. 45000, 70. 47500, 60. 50000, 50. 52500, 50. 55000, 70. 57500, 50. 60000, 10. 75000, 10. Note: all values are approximate.

The data are strongly right-skewed.

8.13

1.

A bar graph represents world records for women’s swimming events, 100 meters. The horizontal axis represents the event. The vertical axis representing time in seconds ranges from 0 to 70, in increments of 10. The bar graph infers the following data. Freestyle: 52. Backstroke: 57. Breaststroke: 64. Butterfly: 56. Note: all values are approximate.

For this visualization, the events are the labels, and the times determine the heights of the bars. Enter the information from the first two columns into a Google Sheet spreadsheet, including the headers “Event” and “Time (sec),”

Highlight your entire table, including the headers.

Then in the “Insert” menu, choose “Chart.” If the result is not already “Column chart,” then change to “Column chart.”

Change over to the “Customize” menu. Use the “Chart and axis titles” option to make the following changes if they are not already the defaults:

Chart title: “World Records for Women’s Swimming Events, 100 m”

Horizontal axis: “Event”

Vertical axis: “Time (sec)”

8.14

1.

Top ten teams by wins:

A bar graph represents the top ten MLB teams by wins, 2019 (good). The horizontal axis represents the team. The vertical axis representing wins ranges from 0 to 120, in increments of 20. The bar graph infers the following data. HOU: 107; LAD: 106; NYY: 103; MIN: 101; ATL: 97; OAK: 97; TBR: 96; CLE: 93; WSN: 93; STL: 91. Note: all values are approximate.

A bar graph represents the top ten MLB teams by wins, 2019 (bad). The horizontal axis represents the team. The vertical axis representing wins ranges from 90 to 108, in increments of 2. The bar graph infers the following data. HOU: 107; LAD: 106; NYY: 103; MIN: 101; ATL: 97; OAK: 97; TBR: 96; CLE: 93; WSN: 93; STL: 91. Note: all values are approximate.

A histogram represents wins by MLB teams, 2019 season (good). The horizontal axis representing wins ranges from 40 to 110, in increments of 10. The vertical axis representing frequency ranges from 0 to 8, in increments of 2. The histogram infers the following data. 40 to 50: 1. 50 to 60: 3. 60 to 70: 3. 70 to 80: 7. 80 to 90: 6. 90 to 100: 6. 100 to 110: 4.

A histogram represents wins by MLB teams, 2019 season (bad). The horizontal axis representing wins ranges from under 70 to 110, in increments of 10. The vertical axis representing frequency ranges from 0 to 8, in increments of 1. The histogram infers the following data. Under 70: 7. 70 to 80: 7. 80 to 90: 6. 90 to 100: 6. 100 to 110: 4.

8.15

1.

The mode is the number that occurs most often. There is a tie for there are two numbers that appear three times: 89 and 104. You can have more than one mode.

There are two modes: 89 and 104, each of which appears three times.

8.16

1.

The mode is the number that occurs most often. The most frequent number of people is 2, which occurs 13 times. The mode is 2.

2

8.17

1.

Suppose you have a set of data with n values, ordered from the smallest to the largest. If n is odd, the median is the data value at position $\frac{{n + 1}}{2}$ . If n is even, then add the values at the $\frac{n}{2}$ and $\frac{n}{2} + 1$ positions, then divide by 2 to find the median.

Since there are 17 data values, use the odd method.

$\frac{{n + 1}}{2} = \frac{{17 + 1}}{2} = 9$

The 9^th data value is 136.

The median is 136.

136

8.18

1.

The table lists the wins in order from largest to smallest. There are 30 data values.

Suppose you have a set of data with n values, ordered from the smallest to the largest. If n is odd, the median is the data value at position $\frac{{n + 1}}{2}$ . If n is even, then add the values at the $\frac{n}{2}$ and $\frac{n}{2} + 1$ positions, then divide by 2 to find the median.

Since there are 30 data values, use the even method.

$\frac{n}{2} = \frac{{30}}{2} = 15$ and $\frac{n}{2} + 1$ = 16

The 15^th value is 81 and the 16^th value is 84.

Add them up and divide by 2: $\frac{{81 + 84}}{2}$ = 82.5

The median is 82.5.

82.5

8.19

1.

Find the total number of data values. 12 + 13 + 8 + 6 + 1 = 40

If n is even, then add the values at the $\frac{n}{2}$ and $\frac{n}{2} + 1$ positions, then divide by 2 to find the median.

Since there are 30 data values, use the even method.

$\frac{n}{2} = \frac{{40}}{2} = 20$ and $\frac{n}{2} + 1$ = 21

To help you find the 20^th and 21^st persons, you can add a Cumulative Frequency column to the table. You can stop when you get to the 21^st person; you don’t have to finish the table.

Number of People in Residence	Frequency	Cumulative Frequency
1	12	12
2	13	12 + 13 = 25 You know the 20^th and 21^st people are in this group!

The 20^th and 21^st people live with 2 people in the residence.

Add them up and divide by 2: $\frac{{2 + 2}}{2} = 2= 82.5$

The median is 2.

2

8.20

1.

The mean is the sum of the values, divided by the number of values.

$\frac{{5 + 8 + 11 + 12 + 12 + 12 + 15 + 18 + 20}}{9} \approx 15.556$

The mean is 15.556.

12.556

8.21

1.

You can use the table to help do your math.

Number of People in Residence	Frequency	People Times Frequency
1	12	12
2	13	26
3	8	24
SUMS:	33	62
	Mean (Divide Sums)	62 ÷ 33 ≈ 1.879

The mean is approximately 1.879.

\frac{{62}}{{33}} = 1.879

8.22

1.

You will have to download the file to be able to work with it. You can upload it to Google Sheets (or open it in Excel if you prefer). These directions assume you use Google Sheets.

After you open the Google Spreadsheet, open the MLB 2019 tab.

MODE: At the bottom of the column for “Wins,” enter “=MODE.MULT(“ and then click and drag through all the numbers in the Wins column.

The modes are the answers you see: 72, 84, 93, and 97.

Hint: If you just used “=MODE(“ you would only see one mode, the final one in the list.

MEDIAN: Use the same process, but enter “=MEDIAN(“

The median is 82.5.

MEAN: Use the same process, but enter “=AVERAGE(“

The mean is approximately 80.967.

Modes: 72, 84, 93, 97
Median: 82.5
Mean: 80.967 (rounded to three decimal places)

8.23

1.

Mode or median would be best, since you might be worried that you will get a lot of “not interested” answers that would skew your average.

Mode or median

2.

Fuel efficiency lends itself well to the use of a mean or median. It is very likely that no number will occur twice, making the mode a useless measure.

Median or mean

3.

You might want to know both the mean and the median. The mean will be pulled to the left by the skew.

Median or mean

8.24

1.

The range is the difference between the maximum and the minimum values.

The maximum is 79.

The minimum is 12.

The range is 79 − 12 = 67 seconds.

67

8.25

1.

Open a copy of the database in Google Sheets. You cannot edit the original.

Look for the InState tab.

In an empty cell, enter “=MAX(“ and then click on the letter at the top of the Tuition column.

You should see $74,514.

In another cell, repeat the process but with “=MIN(“ to find the minimum.

You should see $480.

The range is the difference between the two, $74,034.

Note: It is a good idea to write labels besides the cells to help you remember what numbers are in which cells.

\$ 74,514 - \$ 480 = \$ 74,034

8.26

1.

The formula for standard deviation uses sigma notation where sigma denotes a sum, x represents each data value, $\overline x$ represents the mean of the data values, and n is the number of data values.

$s = \sqrt {\frac{{\sum {{{\left( {x - \overline x } \right)}^2}} }}{{n - 1}}}$

It helps to make a table.

x	$x - \overline x$	$x - \overline x$	${\left( {x - \overline x } \right)^2}$
12	12 − 41	−29	841
58	58 − 41	17	289
35	35 − 41	−6	36
79	79 − 41	38	1,444
21	21 − 41	−20	400
Sum: 205			Sum: 3010
$\overline x = 205 \div 5 = 41$

$s = \sqrt {\frac{{\sum {{{\left( {x - \overline x } \right)}^2}} }}{{n - 1}}} = \sqrt {\frac{{3,010}}{{5 - 1}}} = \sqrt {\frac{{3,010}}{4}} = \sqrt {752.5} \approx 27,432$

The standard deviation is approximately 27.432 seconds.

$s = \sqrt {752.5} \approx 27.432$

8.27

1.

If you have not already done so, copy the original Google sheet. You cannot edit the linked Google Sheet.

Look for the InState tab in the Google Sheet. In a blank cell, enter “=STDEV(“ and click on the letter at the top of the Tuition column.

The standard deviation is $13,333.77.

If you do not see 77 cents, it is because Google Sheets likes to round to dollars. You can change that. Click on the menu item that looks like an arrow under two zeros while the focus is in that cell.

$13,333.77.

8.28

1.

If p percent of the values in a dataset are less than a number n, then we say that n is at the pth percentile.

First, put the data in increasing order: 1, 2, 4, 5, 6, 8, 8, 12, 15, 16

There are 10 data values.

Eight out of the 10 numbers, or 80%, are less than the 9^th number, 15.

The value at the 80^th percentile is 15.

15

2.

If p percent of the values in a dataset are less than a number n, then we say that n is at the pth percentile.

First, put the data in increasing order: 1, 2, 4, 5, 6, 8, 8, 12, 15, 16

There are 10 data values.

Seven out of the 10 numbers, or 70%, are less than the 8^th number, 12.

The value 12 is at the 70^th percentile.

70th percentile

8.29

1.

You want the score at the 15^th percentile.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the AvgSAT tab.

Look for the Average SAT Score column.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Average SAT Score column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.15)

The complete function probably looks like this: =PERCENTILE(C:C,0.15)

The score you see is 1026.

1026

2.

You want the score at the 90^th percentile.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the AvgSAT tab.

Look for the Average SAT Score column.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Average SAT Score column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.90)

The complete function probably looks like this: =PERCENTILE(C:C,0.90)

The score you see is 1318.

(Hint: You could also modify the formula if you used this tab and formula in another exercise.)

1318

3.

You want the percentile rank of 1244.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the AvgSAT tab.

Look for the Average SAT Score column.

In a blank cell, enter the function: =PERCENTRANK(

Click on the letter at the top of the Average SAT Score column.

Then enter a comma, 1244, and close the parentheses: ,1244)

The complete function probably looks like this: =PERCENTRANK(C:C,1244).

You see the decimal value 0.827.

Convert this to a percentage to say that this is the 82.7^th percentile.

82.7th

4.

You want the percentile rank of 1513.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the AvgSAT tab.

Look for the Average SAT Score column.

In a blank cell, enter the function: =PERCENTRANK(

Click on the letter at the top of the Average SAT Score column.

Then enter a comma, 1513, and close the parentheses: ,1513)

The complete function probably looks like this: =PERCENTRANK(C:C,1513).

You see the decimal value 0.992.

Convert this to a percentage to say that this is the 99.2^nd percentile.

99.2nd

8.30

1.

You want the tuition at the 15^th percentile.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the InState tab.

Look for the Tuition column.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Tuition column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.1)

The complete function probably looks like this: =PERCENTILE(C:C,0.1)

The tuition you see is $3,120.

$3,120

2.

Quintiles measure in fifths, or in steps of 20%. The fourth quintile is the 80^th percentile. You want the tuition at the 80^th percentile.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the InState tab.

Look for the Tuition column.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Tuition column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.8)

The complete function probably looks like this: =PERCENTILE(C:C,0.8)

The tuition you see is $26,465.20.

$26,465.20

3.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the InState tab.

Look for the Tuition column.

In a blank cell, enter the function: = PERCENTRANK(

Click on the letter at the top of the Tuition column.

Then enter a comma, 6686, and close the parentheses: ,6686)

The complete function probably looks like this: =PERCENTRANK(C:C,6686).

You see the decimal value 0.323.

Convert this to a percentage to say that this is the 32.3^rd percentile.

32.3rd

4.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the InState tab.

Look for the Tuition column.

In a blank cell, enter the function: = PERCENTRANK(

Click on the letter at the top of the Tuition column.

Then enter a comma, 53922, and close the parentheses: ,53922)

The complete function probably looks like this: =PERCENTRANK(C:C,53922).

You see the decimal value 0.985.

Convert this to a percentage to say that this is the 98.5^th percentile.

98.5th

8.31

1.

The mean of a normal distribution occurs at its peak. Draw a vertical line through the peaks. Where the vertical line hits the horizontal axis marks the mean. The curve on the left has a mean of 11. The curve in the middle has a mean of 13. The curve on the right has a mean of 14.

The red (leftmost) distribution has mean 11, the blue (middle) has mean 13, and the yellow (rightmost) has mean 14.

8.32

1.

Step 1: Draw a line down from the peak.

You are told the peak is at 5.

Step 2: Starting from the peak, notice where the curve changes from curving down to curving up. Mark the point where that curvature changes, the inflection point. Draw a line down from that point. It could help you to hold an index card or other object with a perpendicular edge up to the screen.

It looks like the inflection point is around 11.

Step 3: The distance between the two numbers along the horizontal axis is an estimate of the standard deviation.

$11 - 5 = 6$

The estimate of the standard deviation is 6.

You might have been off a bit because of the difficulty of judging where the inflection point is.

6

8.33

1.

Step 1:Draw a line down from the peak. It could help you to hold an index card up to the screen. Have one edge along the horizontal axis and the other edge going through the peak. Read the position of the card on the horizontal axis. It will tell you the mean is 150.

Step 2: Starting from the peak, notice where the curve changes from curving down to curving up. Mark the point where that curvature changes, the inflection point. Draw a line down from that point. You can use your index card again to help you judge where the inflection point falls.

It looks like the inflection point is around 170.

Step 3: The distance between the two lines along the horizontal axis is an estimate of the standard deviation.

$170 - 150 = 20$

The standard deviation is 20.

You might have been off a bit because of the difficulty of judging where the inflection point is.

Mean: 150; standard deviation: 20

8.34

1.

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
0 − 9	0 − 6	0 − 3	0	0 + 3	0 + 6	0 + 9
−9	−6	−3	0	3	6	9

The rule tells you that 99.7 percent of the data fall between −9 and 9.

99.7%

2.

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
50 − 30	50 − 20	50 − 10	50	50 + 10	50 + 20	50 + 30
20	30	40	50	60	70	80

The 68–95–99.7 Rule tells you that 95 percent of the data falls between 30 and 70.

95%

3.

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
60 − 15	60 − 10	60 − 5	60	60 + 5	60 + 10	60 + 15
45	50	55	60	65	70	75

The 68–95–99.7 Rule tells you that 68 percent of the data falls between 55 and 65.

68%

8.35

1.

65 and 75

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
70 − 15	70 − 10	70 − 5	70	70 + 5	70 + 10	70 + 15
55	60	65	70	75	80	85

The 68–95–99.7 Rule tells you that 68 percent of the data falls within one standard deviation either side of the mean.

The values are 65 and 75.

2.

26 and 54

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
40 − 21	40 − 14	40 − 7	40	40 + 7	40 + 14	40 + 21
19	26	33	40	47	54	61

The 68–95–99.7 Rule tells you that 95 percent of the data falls within two standard deviations of the mean.

The values are 26 and 54.

3.

110 and 290

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
200 − 90	200 − 60	200 − 30	200	200 + 30	200 + 60	200 + 90
110	140	170	200	230	260	290

The 68–95–99.7 Rule tells you that 99.7 percent of the data falls within three standard deviations of the mean.

The values are 110 and 290.

8.36

1.

47.5%

Consider these boundaries on the normal curve.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
500 − 300	500 − 200	500 − 100	500	500 + 100	500 + 200	500 + 300
200	300	400	500	600	700	800

Sketch the curve and shade in the area between 300 and 500.

If it were 300 to 700, it would be 95%.

However, it is just half of that.

$95 \div 2 = 47.5\%$

The 68–95–99.7 Rule tells you that 47.5% of the data values are between 300 and 500.

47.5%

2.

15.85%

Consider these boundaries on the normal curve.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
500 − 300	500 − 200	500 − 100	500	500 + 100	500 + 200	500 + 300
200	300	400	500	600	700	800

Sketch the curve and shade in the area between 600 and 800.

What if it were 500 to 800?

If it were 200 to 800, it would be 99.7%.

However, 500 to 800 is just half of that.

$99.7 \div 2 = 49.85\%$

49.85% of the data falls between 500 and 800.

Now you need to subtract off the data between 500 and 600.

Since 600 is one SD from the mean, you know there is 34% of the area between 500 and 600.

Subtract 34% from 49.85% to get 15.85%.

The 68–95–99.7 Rule tells you that 15.85% of the data values are between 600 and 800.

15.85%

3.

81.5%

Consider these boundaries on the normal curve.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
500 − 300	500 − 200	500 − 100	500	500 + 100	500 + 200	500 + 300
200	300	400	500	600	700	800

Sketch the curve and shade in the area between 400 and 700.

Handle the portion left of the peak and right of the peak separately.

Left of the peak is half of the one-standard deviation from the curve portion. This is half of 68%, or 34% of the area.

Right of the peak is half of the two-standard deviation from the curve portion. This is half of 95%, or 47.5% of the area.

Add 34% and 47.5% to get 81.5%.

The 68–95–99.7 Rule tells you that 81.5% of the data values are between 400 and 700.

81.5%

8.37

1.

−1.8

1.6

−0.6

If x is a member of a normally distributed dataset with mean µ and standard deviation σ, then the standardized score for x is $z = \frac{{x - \mu }}{\sigma }$ .

$z = \frac{{x - \mu }}{\sigma } = \frac{{66 - 75}}{5} = - 1.8$

$z = \frac{{x - \mu }}{\sigma } = \frac{{83 - 75}}{5} = 1.6$

$z = \frac{{x - \mu }}{\sigma } = \frac{{72 - 75}}{5} = - 0.6$

8.38

1.

If x is a member of a normally distributed dataset with mean µ and standard deviation σ, then you know a z-score, $x = \mu + z \times \sigma$ .

$x = \mu + z \times \sigma = 2 + \left( { - 2.3} \right)\left( {20} \right) = 2 - 46 = - 44$

$x = \mu + z \times \sigma = 2 + \left( {1.4} \right)\left( {20} \right) = 2 + 28 = 30$

$x = \mu + z \times \sigma = 2 + \left( {0.2} \right)\left( {20} \right) = 2 + 4 = 6$

–44
30
6

8.39

1.

80th

9th

97th

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

For 25:

Using your data: =NORM.DIST(25,20,6,TRUE)

You will see the decimal version of the percentile: 0.79761619

Convert the decimal to an approximate percentage: the 80^th percentile

For 12:

Using your data: =NORM.DIST(12,20,6,TRUE)

You will see the decimal version of the percentile: 0.09121121973

Convert the decimal to an approximate percentage: the 9^th percentile

For 31:

Using your data: =NORM.DIST(31,20,6,TRUE)

You will see the decimal version of the percentile: 0.966234924

Convert the decimal to an approximate percentage: the 97^th percentile

8.40

1.

3.9

6.3

2.9

When your data is normally distributed and you know the percentile, mean, and standard deviation, you can find the data value with the NORM.INV function.

NORM.INV(percentile as a decimal, mean, standard deviation)

For 25^th percentile:

In an empty cell in Google Sheets: =NORM.INV(0.25, 5, 1.6)

You see: 3.9208164

The approximate data value: 3.9

For the 80^th percentile:

In an empty cell in Google Sheets: =NORM.INV(0.8, 5, 1.6)

You see: 6.346593972

The approximate data value: 6.3

For the 10^th percentile:

In an empty cell in Google Sheets: =NORM.INV(0.1, 5, 1.6)

You see: 2.949517497

The approximate data value: 2.9

8.41

1.

1.29

If x is a member of a normally distributed dataset with mean µ and standard deviation σ, then the standardized score for x is

z = \frac{{x - \mu }}{\sigma } = \frac{{715 - 565}}{{116}} = \frac{{150}}{{116}} \approx 1.29

.

2.

94.5th

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Using your data: =NORM.DIST(166, 150, 10,TRUE)

You will see the decimal version of the percentile: 0.9452007083

Convert the decimal to an approximate percentage: the 94.5^th percentile

3.

An LSAT score of 161 is better

Find the percentile score for both scores.

GMAT of 650:

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Using your data: =NORM.DIST(650, 565, 116,TRUE)

You will see the decimal version of the percentile: 0.7681471684

Convert the decimal to an approximate percentage: the 76.8^th percentile

LSAT of 161:

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Using your data: =NORM.DIST(161, 150, 10,TRUE)

You will see the decimal version of the percentile: 0.8643339391

Convert the decimal to an approximate percentage: the 86.4^th percentile

The GMAT score is at the 76.8^th percentile and the LSAT score is at the 86.4^th percentile. The LSAT score is better.

8.42

1.

Since the number of heads is distributed normally, it can be shown that the mean of n flips is $\frac{n}{2}$ and the standard deviation is $\frac{{\sqrt n }}{2}$ as long as $n \geq {\text{ }}20$ .

Since n is 144, $\frac{n}{2}$ is 72.

The mean is 72.

72

2.

Since the number of heads is distributed normally, it can be shown that the mean of n flips is $\frac{n}{2}$ and the standard deviation is $\frac{{\sqrt n }}{2}$ as long as $n \geq {\text{ }}20$ .

Since n is 144, $\frac{{\sqrt n }}{2} = \frac{{\sqrt {144} }}{2} = \frac{{12}}{2} = 6$ .

The standard deviation is 6.

6

3.

Since the number of heads is distributed normally, it can be shown that the mean of n flips is $\frac{n}{2}$ and the standard deviation is $\frac{{\sqrt n }}{2}$ as long as $n \geq {\text{ }}20$ .

Since n is 144, $\frac{n}{2}$ is 72.

The mean is 72.

$\frac{{\sqrt n }}{2} = \frac{{\sqrt {144} }}{2} = \frac{{12}}{2} = 6$ .

The standard deviation is 6.

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Your data value is 81, your mean is 72, and your standard deviation is 6.

Using your data: =NORM.DIST(81, 72, 6,TRUE)

You will see the decimal version of the percentile: 0.9331927987

Convert the decimal to an approximate percentage: the 93^rd percentile

93^rd

8.43

1.

Using NORM.INV: 1092.8

Using PERCENTILE: 1085

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the AvgSAT tab.

Look for the Average SAT Score column.

FIRST METHOD: NORM.INV

To use this function, you need to know the mean and standard deviation.

The function to find the mean: =AVERAGE(C:C)

The mean is approximately 1141.174114

The function to find the standard deviation: =STDEV(C:C)

The standard deviation is approximately 125.516043

When your data is normally distributed and you know the percentile, mean, and standard deviation, you can find the data value with the NORM.INV function.

NORM.INV(percentile as a decimal, mean, standard deviation)

You want the value at the 35^th percentile, so the first argument is 0.35.

In an empty cell in Google Sheets: =NORM.INV(0.35, 1141.174114, 125.516043)

You see: 1092.809961

The approximate value: 1092.8

SECOND METHOD: PERCENTILE

You want the score at the 35^th percentile.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Average SAT Score column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.35)

The complete function probably looks like this: =PERCENTILE(C:C,0.35)

The number you see is 1085.

The NORM.INV method yielded 1092.8 while the PERCENTILE method gave you 1085.

8.44

1.

If one variable seems to depend on the other, it is the response (or dependent) variable. The dataset that the response variable depends on is the explanatory (or independent) variable. In this situation, a person’s income depends on age. Age is the response variable.

Income

2.

If one variable seems to depend on the other, it is the response (or dependent) variable. The dataset that the response variable depends on is the explanatory (or independent) variable. In this situation, both variables are influenced by a student’s academic ability. So, either could be chosen as the response variable.

Either; neither one seems to directly influence the other (they’re both influenced by the student’s academic ability)

3.

If one variable seems to depend on the other, it is the response (or dependent) variable. The dataset that the response variable depends on is the explanatory (or independent) variable. In this situation, a student’s GPA depends on the hours spent studying. The GPA is the response variable.

GPA

8.45

1.

A scatter plot is a visualization of the relationship between two sets of data. You turn the dataset into ordered pairs. The first coordinate is from the explanatory dataset and the second coordinate contains the corresponding value from the response dataset. Plot these points in the xy-plane.

Use the receptions as the x-coordinates and the yards as the y-coordinates.

(127, 1535), (115, 1374), (115, 1407), (114, 1407), (105, 1416)

A scatter plot shows five points. The horizontal axis representing receptions ranges from 105 to 130, in increments of 5. The vertical axis representing yards ranges from 1100 to 1600, in increments of 100. The points are at the following coordinates: (105, 1420), (107, 1200), (115, 1380), (115, 1410), and (127, 1540). Note: all values are approximate.

8.46

1.

A scatter plot represents PTS versus W. The horizontal axis representing W ranges from 25 to 65, in increments of 5. The vertical axis representing PTS ranges from 50 to 130, in increments of 20. The points are arranged in increasing order. Some of the points are as follows: (32, 70), (35, 80), (40, 88), (45, 100), and (50, 108). Note: all values are approximate.

In Google Sheets, highlight the data including the headers.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the NHL19 tab.

It will be easier for Google to suggest what you want if your columns are adjacent. Copy the Points column to the left of the Wins column.

Highlight the columns for PTS (Points) and W (Wins), (including the headers).

Choose Insert, then choose Chart.

It may suggest a line chart in “Chart type.”

In the Chart type drop down box, choose a scatter plot.

Make sure that you see “W” on the horizontal axis.

8.47

1.

\,

No curved pattern
Strong negative relationship, $r \approx - 0.9$

a. No, this is not curved.

b. This has a fairly strong negative relationship, with a possible r value of −0.9.

The correlation coefficient (r) denotes the strength and direction of a linear relationship. Positive values near 1 indicate that as one variable increases, the other does. The closer r is to −1, the more likely that as one variable increases, the other decreases. If r is zero, there is no relationship. A value of plus or minus 0.7 is considered the dividing line between strong and weak relationships. For this relationship, your guess could have been anywhere between −0.8 and −0.95. To know r for certain, you would need to do calculations using the coordinates of the points.

2.

\,

Curved pattern

a. Yes, this is curved.

3.

\,

No curved pattern
No apparent relationship, $r \approx 0$

a. No, this is not curved.

b. This has no relationship, with an r value of 0.

The correlation coefficient (r) denotes the strength and direction of a linear relationship. Positive values near 1 indicate that as one variable increases, the other does. The closer r is to −1, the more likely that as one variable increases, the other decreases. If r is zero, there is no relationship. A value of plus or minus 0.7 is considered the dividing line between strong and weak relationships.

4.

\,

No curved pattern
Weak positive relationship, $r \approx 0.6$

a. No, this is not curved.

b. This has a weak positive relationship, with an r value of 0.6

The correlation coefficient (r) denotes the strength and direction of a linear relationship. Positive values near 1 indicate that as one variable increases, the other does. The closer r is to −1, the more likely that as one variable increases, the other decreases. If r is zero, there is no relationship. A value of plus or minus 0.7 is considered the dividing line between strong and weak relationships. Your estimate of r may have been different. To know r for certain, you would need to do calculations using the coordinates of the points.

8.48

1.

$r = - 0.91$
Not appropriate
$r = - 0.01$
$r = 0.62$

8.49

1.

y = 0.75\left( \frac{20}{5} \right)(x - 100) + 200 = 3x - 100

If x and y are explanatory and response datasets with means $\overline x$ and $\overline y$ respectively, and standard deviations, ${s_x}$ and ${s_y}$ respectively, and correlation coefficient r, then the equation of the regression line is $y = r\left( {\frac{{{s_y}}}{{{s_x}}}} \right)\left( {x - \overline x } \right) - \overline y$

$y = 0.75\left( {\frac{{20}}{5}} \right)\left( {x - 100} \right) + 200$

$y = 0.75\left( 4 \right)\left( {x - 100} \right) + 200$

$y = 3\left( {x - 100} \right) + 200$

$y = 3x - 300 + 200$

$y = 3x - 100$

8.50

1.

In Google Sheets, highlight the data including the headers.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the NHL19 tab.

It will be easier for Google to suggest what you want if your columns are adjacent. Copy the Points column to the left of the Wins column.

Highlight the columns for PTS (Points) and W (Wins) (including the headers).

Choose Insert, then choose Chart.

It may suggest a line chart in “Chart type.”

In the Chart type drop down box, choose a scatter plot.

Make sure that the “Series” selected are PTS vs W.

Once you have the scatter plot, Google Sheets can generate the regression line by clicking the three dots at the top right of the plot.

Then select “Edit chart.”

Then click on “Customize” and “Series.”

Then add the regression line by checking the box next to “Trendline.”

Show the equation by selection “Use Equation” in the drop-down menu under “Label.”

The equation appears on the graph: 1.8*x + 16.9

The “y = ” is implied.

$y{\text{ }} = 1.8x\; + {\text{ }}16.9$

$y = 1.8x + 16.9$

8.51

1.

$y = 0.174x + 14.5$

2.

26.68

3.

Predicted: 28; actual: 18. The Phillies were caught around 10 fewer times than expected.

4.

Every 10 additional steal attempts will result in getting caught about 1.7 times on average.

8.52

1.

In Google Sheets, highlight the data including the headers.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the WNBA2019 tab.

It will be easier for Google to suggest what you want if your columns are adjacent. First create a blank column to the right of the column FG%. Then copy the column 3P% into the blank column.

Highlight the columns for FG% and 3P% (including the headers).

Choose Insert, then choose Chart.

It may suggest a line chart in “Chart type.”

In the Chart type drop down box, choose a scatter plot.

Make sure that FG% is on the horizontal axis.

Once you have the scatter plot, Google Sheets can generate the regression line by clicking the three dots at the top right of the plot.

Then select “Edit chart.”

Then click on “Customize” and “Series.”

Then add the regression line by checking the box next to “Trendline.”

Show the equation by selection “Use Equation” in the drop-down menu under “Label.”

The equation appears on the graph: 0.548*x + 0.106

The “y = ” is implied.

$y{\text{ }} = 0.548x\; + {\text{ }}0.106$

y = 0.5481x + 0.1055

, where x is the proportion of made field goals and y is the proportion of made three-point field goals

2.

0.347

Use the equation, $y{\text{ }} = 0.548x\; + {\text{ }}0.106$ , where x is the proportion of field goals and y is the proportion of three-point field goals.

Substitute 0.44 for x.

$0.548\left( {0.44} \right){\text{ }} + {\text{ }}0.106{\text{ }} = {\text{ }}0.347$

You would expect a proportion of 0.347.

3.

Predicted: 0.340; actual: 0.368. The Aces made about 2.8% more of their three-point shots than expected.

4.

The regression equation is $y{\text{ }} = 0.548x\; + {\text{ }}0.106$ .

The slope is 0.548.

This tells you that an increase of one percent in the field goal attempts will result, in the long term, in an expected increase of around 55 percent in made three-point field goal attempts.

An increase of 1% in made field goal attempts will result in an expected increase of 0.55% in made three-point field goal attempts.

Check Your Understanding

1.

This is a cluster sample since seniors are selected a homeroom at a time. A cluster sample is a sample where clusters of units are chosen at random, instead of choosing individual units.

Randomization is being used; cluster random sample.

2.

No randomization is being used.

3.

A stratified sample is chosen so that particular groups in the population are certain to be represented. This is a stratified random sample since they sampled within each sport.

Randomization is being used; stratified random sample.

4.

A categorical frequency distribution is a table with two columns. The first column contains all the categories present in the data, each listed once. The second column contains the frequencies of each category.

Make a table with a header that describes your categories and labels for each row. The second column is labeled “Frequency.” Check your work by making sure the sum of the second column is correct.

Genre	Frequency
Cooking	1
Non-fiction	3
Romance	4
Thriller	3
True Crime	3
Young Adult	6

Check the sum: 1 + 3 + 4 + 3 + 3 + 6 = 20, which is correct.

Genre	Frequency
Cooking	1
Non-fiction	3
Romance	4
Thriller	3
True Crime	3
Young Adult	6

5.

Step 1: Count the number of times you see each response.

Step 2: Make a table with two columns. The first column should be labeled so that the reader knows what each response means. The second should be labeled “Frequency.” Then fill in the results of the count.

Number of Classes	Frequency
1	1
2	3
3	16
4	8
5	4

Step 3. Check your work. If you add up the counts, it should be the same number as the total number of responses.

1 + 3 + 16 + 8 + 4 = 32

Number of classes	Frequency
1	1
2	3
3	16
4	8
5	4

6.

Range of Cell Phone Subscriptions Per Hundred People	Frequency
0.0 – 24.9	1
25.0 – 49.9	3
50.0 – 74.9	1
75.0 – 99.9	6
100.0 – 124.9	7
125.0 – 149.9	3
150.0 – 174.9	3
175.0 – 199.9	1

(Note: Answers may vary based on choices made about bins.)

7.

Step 1: First make a frequency table.

Major	Frequency
Amphibian	3
Bird	5
Mammal	12
Reptile	4

A bar chart consists of a series of rectangles arranged side-by-side (but not touching). Each rectangle corresponds to one of the categories. All the rectangles have the same width. The height of each rectangle corresponds to either the number of units in the corresponding category or the proportion of the total units that fall into the category. Make a bar chart using the frequency.

Step 1: Draw the axes with the origin at the bottom left.

Step 2: Next, place your categories evenly spaced along the bottom of the horizontal axis. Label the horizontal axis with the names of your categories.

Step 3: Decide how to define the height of the rectangles, the frequency itself or the proportion. Mark the vertical axis with units appropriate to your choice. We are using the frequency, so mark the vertical axis up to 14. Add labels and tick marks at 0, 2, 4, 6, 8, 10, 12, and 14.

Step 4: Draw in the rectangles. Place a horizontal mark above the labels at the height appropriate for each category.

Step 5: Draw vertical lines straight down from the edges of your mark to make a rectangle.

Step 6: Build the rest of the rectangles, making sure they all have the same width. Do not let the rectangles touch.

A bar graph titled, animals treated. The horizontal axis represents classification. The vertical axis representing frequency ranges from 0 to 14, in increments of 2. The bar graph infers the following data. Amphibian: 3. Bird: 5. Mammal: 12. Reptile: 4.

8.

A pie chart consists of a circle divided into wedges, with each edge corresponding to a category. The proportion of the area of the entire circle that each wedge represents corresponds to the proportion of the data in that category. First, enter your table into a new Google Sheet. Next, click and drag to select the full table, including the header row. Click on the “Insert” menu, then select “Chart.” The result may be a pie chart by default. If it is not, you can change it to a pie chart using the “Chart type” drop-down menu in the Chart editor.

Major	Frequency
Amphibian	3
Bird	5
Mammal	12
Reptile	4

Your pie chart may look a bit different than the one in the solution and that’s fine. Just be sure that the relative size of the wedges is the same and that you have labels on your pie “slices.”

A pie chart titled, animals treated. The pie chart is divided into four unequal parts. The pie chart infers the following data. Amphibian: 3. Bird: 5. Mammal: 12. Reptile: 4.

9.

12	7
13	0 2 3 6 6 7 8 9 9
14	1 2 3 3 6 8 8
15	3 3 5 6 6 6 7 7 8 8
16	4 7 8

Stem-and-leaf plots are visualization tools that consist of a list of stems on the left and corresponding leaves on the right, separated by a line. The stems are the numbers that make up the data only up to the next-to-the last digit, and the leaves are the final digits. There is one leaf for every data value (which means that leaves may be repeated).

12	7
13	9 6 8 0 7 2 3 9 6
14	2 3 6 8 1 8 3
15	5 3 8 6 8 6 7 6 3 7
16	7 4 8

Add the frequencies to check your work: 1 + 9 + 7 + 10 + 3 = 30

10.

Step 1: Add data to bins.

Histograms are easy to make when the stem-and-leaf plot is already done. Since they want a bin-width of 5, let’s order the stem-and-leaf plot.

12	7
13	0 2 3 6 6 7 8 9 9
14	1 2 3 3 6 8 8
15	3 3 5 6 6 6 7 7 8 8
16	4 7 8

If using a bin-width of 5, you can compute the frequency by counting the number of leaves associated with the corresponding stem:

Bin	Frequency
125-129	1
130-134	3
135-139	6
140-144	4
145-149	3
150-154	2
155-159	8
160-164	1
165-169	2

Add the frequencies to check your work: 1 + 3 + 6 + 4 + 3 + 2 + 8 + 1 + 2 = 30

Step 2. Create the axes.

On the horizontal axis, start labeling with the lower end of the first bin (in this case, 125), and go up to the higher end of the last bin (170). Mark off the other bin boundaries, making sure they are evenly spaced.

On the vertical axis, start with zero and go up to at least the highest frequency you see in your bins (8 in this example). You could go up to 8 and make labels and tick marks at 2, 4, 6, and 8.

Step 3. Draw in the bars.

Remember that the bars of a histogram touch. The heights are determined by the frequency. The first bar will cover from 125 to 129 at a height of 1. The second bar will cover from 130 to 134 at a height of 3. Continue to draw bars using your bin and frequency table until you draw a bar from 165 to 170 at a height of 2.

A histogram titled, weekly help desk customers. The horizontal axis representing customers ranges from 125 to 170, in increments of 5. The vertical axis representing frequency ranges from 0 to 8, in increments of 2. The histogram infers the following data. 125 to 130: 1. 130 to 135: 3. 135 to 140: 6. 140 to 145: 4. 145 to 150: 3. 150 to 155: 2. 155 to 160: 8. 160 to 165: 1. 165 to 170: 2.

11.

For this visualization, the campuses are the labels, and the admission rates determine the heights of the bars. Enter the information into a Google Sheet spreadsheet, including the headers “Campus” and “Admission Rate.” Then in the “Insert” menu, choose “Chart.” If the result is not already “Column chart,” then change to “Column chart.”

Change over to the “Customize” menu. Use the “Chart and axis titles” option to make the following changes if they are not already the defaults:

Chart title: “Admission Rate at Different Campuses in the University of California System”

Horizontal axis: “Campus”

Vertical axis: “Admission Rate”

A bar graph titled, admission rate at different campuses in the University of California System. The horizontal axis represents campus. The vertical axis representing the admission rate ranges from 0 to 0.7, in increments of 0.1. The bar graph infers the following data. Berkeley: 0.14. Davis: 0.41. Irvine: 0.28. Los Angeles: 0.14. Merced: 0.66. Riverside: 0.5. San: Diego: 0.3. Santa Barbara: 0.32. Santa Cruz: 0.48. Note: all values are approximate.

(data source:https://data.ed.gov/)

12.

A bar graph titled, out-of-state costs at University of California campuses, unbiased. The horizontal axis represents campus. The vertical axis represents cost ranges from 0 to 45000, in increments of 5000. The bar graph infers the following data. Berkeley: 43,176. Davis: 43,394. Irvine: 42,692. Los Angeles: 42,218. Merced: 42,530. Riverside: 42,819. San Diego: 43,159. Santa Barbara: 43,383. Santa Cruz: 42,952. Note: all values are approximate.

(data source: https://data.ed.gov/)

A bar graph titled, out-of-state costs at University of California campuses, biased. The horizontal axis represents campus. The vertical axis represents cost ranges from 41600 to 43600, in increments of 200. The bar graph infers the following data. Berkeley: 43,176. Davis: 43,394. Irvine: 42,692. Los Angeles: 42,218. Merced: 42,530. Riverside: 42,819. San Diego: 43,159. Santa Barbara: 43,383. Santa Cruz: 42,952. Note: all values are approximate.

(data source: https://data.ed.gov/)

13.

The mode is the most frequently occurring data value. The value 112 occurs three times, which is more than any other number. The mode is 112.

If n is even, then add the values at the $\frac{n}{2}$ and $\frac{n}{2} + 1$ positions, then divide by 2 to find the median.

Since there are 28 data values, use the even method.

$\frac{n}{2} = \frac{{28}}{2} = 14$ and $\frac{n}{2} + 1$ = 15

The 14^th value is 112 and the 15^th value is 114.

Add them up and divide by 2: $\frac{{112 + 114}}{2}= 113$

The median is 113.

The mean is the sum of the data values divided by the number of data values.

$\frac{{88 + 89 + 90 \cdot 2 + 97 + 102 + 105 + 106 + 107 + 108 + 111 + 112 \cdot 3 + 114 + 115 + 117 + 119 \cdot 2 + 120 \cdot 2 + 123 + 125 + 127 + 130 + 131 \cdot 2 + 134}}{{28}}$

The mean is approximately 112.64.

Mode: 112

Median: 113

Mean: 112.64

14.

3

The mode is the most frequently occurring data value. The most frequently occurring number is 3. The mode is 3.

15.

Add the frequencies: 1 + 3 + 16 + 8 + 4 = 32

If n is even, then add the values at the $\frac{n}{2}$ and $\frac{n}{2} + 1$ positions, then divide by 2 to find the median.

Since there are 30 data values, use the even method.

$\frac{n}{2} = \frac{{32}}{2} = 16$ and $\frac{n}{2} + 1= 17$

You can use the table to find the 16^th and 17^th values.

Number of Classes	Frequency	Cumulative Frequency
1	1	1
2	3	1 + 3 = 4
3	16	4 + 16 = 20 You know the 16^th and 17^th data value are both in this category!

The 16^th and 17^th values are both 3.

Add them up and divide by 2: $\frac{{3 + 3}}{2} = 3= 82.5$

The median is 3.

3

16.

Let the table do your math.

Number of Classes	Frequency	Number of Classes Times Frequency
1	1	1
2	3	6
3	16	48
4	8	32
5	4	20
SUMS	32	107
	Mean	107 ÷ 32 ≈ 3.344

The mean is approximately 3.344.

\frac{107}{32} \approx 3.344

17.

156

18.

You can let Google Sheets do the heavy lifting. Enter the data values in a new Google Sheet.

Use Data, Sort to sort your data values. Remember, you can only use the Median formula on sorted data values. In the next column over, type “=MEDIAN(“ and highlight all your data values.

You will see your median is 147.

147

19.

If you already have your data values in a Google Sheet, in another cell type “=AVERAGE(“ and highlight your data values.

You will see that your mean is 147.2.

147.2

20.

Campus	Admission Rate
Berkely	0.1484
Davis	0.4107
Irvine	0.2876
LA	0.1404
Merced	0.6617
Riverside	0.5057
San Diego	0.3006
Santa Barbara	0.322
Santa Cruz	0.4737

Enter the data values in a new Google Sheet.

In the next column over, type “=MODE.MULT(“ and highlight all your data values.

You will see an error statement. If you hover over it, you will see a message stating that your mode does not exist since no value occurs more than once. You could also say that every value is the mode since they all occur once.

In another cell type “=MEDIAN(“ and highlight all your data values.

The median is 0.322.

In another cell type “=AVERAGE(“ and highlight all your data values.

The mean is approximately 0.3612.

Mode: not useful; every value appears only once

Median: 0.322

Mean: 0.3612

21.

Campus	Cost ($)
Berkely	$43,176
Davis	$43,394
Irvine	$42,692
LA	$42,218
Merced	$42,530
Riverside	$42,819
San Diego	$43,159
Santa Barbara	$43,383
Santa Cruz	$42,952

Enter the data values in a new Google Sheet or change the second column to Cost ($).

In the next column over, type “=MODE.MULT(“ and highlight all your data values.

You will see an error statement. If you hover over it, you will see a message stating that your mode does not exist since no value occurs more than once. You could also say that every value is the mode since they all occur once.

In another cell type “=MEDIAN(“ and highlight all your data values.

The median is $42,952.

In another cell type “=AVERAGE(“ and highlight all your data values.

The mean is approximately $42,924.78.

You may need to use the key that arrows over to more decimal places to see the 78 cents in the mean. Google Sheets often rounds to whole dollars. The menu item to show more decimal places looks like an arrow below two zeros.

Mode: not useful; every value appears only once

Median: $42,952

Mean: $42924.78

22.

If you have not already done so, you will need to copy the data from the InState tab into your own Google Sheet.

In an empty cell, enter “=MODE.MULT(“ and click on the Tuition column header.

Your mode is $13,380.

In another cell enter “=MEDIAN(“ and click on the Tuition column header.

The median is $11,207.

In another cell enter “=AVERAGE(“ and click on the Tuition column header.

The mean is approximately $15,476.79.

Mode: $13,380
Median: $11,207
Mean: $15,476.79

23.

Since the data are right skewed, the mean will be bigger than the mean. Thus, the workers would rather use the median, while the management will prefer the mean.

24.

The range is the difference between the maximum and the minimum values.

The maximum is 10.

The minimum is 1.

The range is 10 − 1 = 9.

9

25.

The formula for standard deviation uses sigma notation where sigma denotes a sum, x represents each data value, $\overline x$ represents the mean of the data values, and n is the number of data values.

$s = \sqrt {\frac{{\sum {{{\left( {x - \overline x } \right)}^2}} }}{{n - 1}}}$

It helps to make a table.

x	$x - \overline x$	$x - \overline x$	${(x - \overline x )^2}$
1	1 − 5	−4	16
4	4 − 5	−1	1
5	5 − 5	0	0
5	5 − 5	0	0
10	10 − 5	5	25
Sum: 25			Sum: 42
$\overline x = 25 \div 5 = 5$

$s = \sqrt {\frac{{\sum {{{\left( {x - \overline x } \right)}^2}} }}{{n - 1}}} = \sqrt {\frac{{42}}{{5 - 1}}} = \sqrt {\frac{{42}}{4}} = \sqrt {10.5} \approx 3.240$

The standard deviation is 3.240.

\sqrt {10.5} \approx 3.240

26.

The range is the difference between the maximum and the minimum values.

The maximum is 168.

The minimum is 127.

The range is 168 − 127 = 41 requests.

168 - 127 = 41

27.

Enter the values into a Google Sheet so you can use technology to help you.

In a blank cell, enter “=STDEV(“ and click on the letter at the top of the column where you entered your data values.

You should see approximately 11.306 requests.

11.306

28.

Enter the Admission values into a Google Sheet so you can use technology to help you.

In a blank cell, enter “=MAX” and click on the letter at the top of the column where you entered your data values. You will see the maximum is 0.6617.

In a blank cell, enter “=MIN” and click on the letter at the top of the column where you entered your data values. You will see the minimum is 0.1404.

The range is the difference: $0.6617 - 0.1404 = 0.5213$ .

Range:

0.6617 - 0.1404 = 0.5213

29.

Enter the Admission values into a Google Sheet so you can use technology to help you.

In a blank cell, enter “=STDEV(“ and click on the letter at the top of the column where you entered your data values.

You should see approximately 0.170.

Standard deviation: 0.170

30.

Enter the Cost values into a Google Sheet so you can use technology to help you.

In a blank cell, enter “=MAX” and click on the letter at the top of the column where you entered your data values. You will see the maximum is $43,394.

In a blank cell, enter “=MIN” and click on the letter at the top of the column where you entered your data values. You will see the minimum is $42.218.

The range is the difference: $43,394 - 42,218 = \$1,176$ .

Range:

\$ 43,394 - 42,218 = \$ 1,176

31.

Enter the Cost values into a Google Sheet so you can use technology to help you.

In a blank cell, enter “=STDEV(“ and click on the letter at the top of the column where you entered your data values.

You should see approximately $398.37.

Standard deviation: $398.37

32.

If p percent of the values in a dataset are less than a number n, then we say that n is at the pth percentile.

The data values are already in increasing order: 10, 12, 14, 18, 21, 23, 24, 25, 29, 30

There are 10 data values.

At the 30^th percentile, 30% of the values are smaller than the number.

30 percent of 10 data values is three values.

In this set, three out of the 10 numbers are less than 18.

The value at the 30^th percentile is 18.

18

33.

Quintiles measure in fifths, or in steps of 20%. The first quintile is the 20^th percentile.

If p percent of the values in a dataset are less than a number n, then we say that n is at the pth percentile.

The data values are already in increasing order: 10, 12, 14, 18, 21, 23, 24, 25, 29, 30

There are 10 data values.

At the 20^th percentile, 20% of the values are smaller than the number.

20 percent of 10 values is 2 values.

In this set, two out of the 10 numbers are less than 14.

The value at the 20^th percentile (first quintile) is 14.

14

34.

If p percent of the values in a dataset are less than a number n, then we say that n is at the pth percentile.

The data values are already in increasing order: 10, 12, 14, 18, 21, 23, 24, 25, 29, 30

There are 10 data values.

Eight of the ten values, or 80% of the data, are less than 29.

29 is at the 80^th percentile.

80th

35.

If p percent of the values in a dataset are less than a number n, then we say that n is at the pth percentile.

The data values are already in increasing order: 10, 12, 14, 18, 21, 23, 24, 25, 29, 30

There are 10 data values.

Six of the ten values, or 60% of the data, are less than 24.

24 is at the 60^th percentile.

60th

36.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the MLB2019 tab.

Look for the Wins column.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Wins column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.30)

The complete function probably looks like this: =PERCENTILE(B:B,0.3)

The number you see is 71.7.

71.7

37.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the MLB2019 tab.

Look for the Wins column.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Wins column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.90)

The complete function probably looks like this: =PERCENTILE(B:B,0.9)

The number you see is 101.2.

(Hint: You could also modify the formula if you used this tab and formula in another exercise.)

101.2

38.

Quartiles break the data into four parts. The first quartile is the 25^th percentile.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the MLB2019 tab.

Look for the Wins column.

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of the Wins column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.25)

The complete function probably looks like this: =PERCENTILE(B:B,0.25)

The number you see is 70.25.

(Hint: You could also modify the formula if you used this tab and formula in another exercise.)

70.25

39.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the MLB2019 tab.

Look for the Wins column.

In a blank cell, enter the function: = PERCENTRANK(

Click on the letter at the top of the Wins column.

Then enter a comma, 84, and close the parentheses: ,84)

The complete function probably looks like this: =PERCENTRANK(B:B,84)

You see the decimal value 0.517.

Convert this to a percentage to say that this is the 51.7^th percentile.

51.7th

40.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the MLB2019 tab.

Look for the Wins column.

In a blank cell, enter the function: = PERCENTRANK(

Click on the letter at the top of the Wins column.

Then enter a comma, 96, and close the parentheses: ,96)

The complete function probably looks like this: =PERCENTRANK(B:B,96)

You see the decimal value 0.793.

Convert this to a percentage to say that this is the 79.3^rd percentile.

(Hint: You could also modify the formula if you used this tab and formula in another exercise.)

79.3rd

41.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the MLB2019 tab.

Look for the Wins column.

In a blank cell, enter the function: = PERCENTRANK(

Click on the letter at the top of the Wins column.

Then enter a comma, 67, and close the parentheses: ,67)

The complete function probably looks like this: =PERCENTRANK(B:B,67)

You see the decimal value 0.138.

Convert this to a percentage to say that this is the 13.8^th percentile.

(Hint: You could also modify the formula if you used this tab and formula in another exercise.)

13.8th

42.

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
100 − 36	100 − 24	100 − 12	100	100 + 12	100 + 24	100 + 36
64	76	88	100	112	124	136

76 and 124 are two standard deviations above and below the mean. The 68–95–99.7 Rule tells you that 95 percent of the data values are between 76 and 124.

95%

43.

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
100 − 36	100 − 24	100 − 12	100	100 + 12	100 + 24	100 + 36
64	76	88	100	112	124	136

Sketch the curve and shade in the area between 100 and 112.

88 and 112 are one standard deviation above and below the mean. The 68–95–99.7 Rule tells you that 68 percent of the data values are between 88 and 112. Your shaded area is half of that, so your shaded area covers 34 percent of the data.

34%

44.

The 68–95–99.7 Rule tells you that 68 percent of the data would fall within 3 standard deviations above or below the mean. Ninety-five percent of the data would fall within 2 standard deviations above or below the mean. Finally, 99.7 percent of the data would fall within 1 standard deviation above or below the mean.

mean − 3 SD	mean − 2 SD	mean − 1 SD	mean	mean + 1 SD	mean + 2 SD	mean + 3 SD
100 − 36	100 − 24	100 − 12	100	100 + 12	100 + 24	100 + 36
64	76	88	100	112	124	136

Sketch the curve and shade in the area between 100 and 112.

88 and 112 are one standard deviation above and below the mean. The 68–95–99.7 Rule tells you that 68 percent of the data values are between 88 and 112. Your shaded area is half of that, so your shaded area covers 34 percent of the data.

To find the total area below 112, add 50% of the curve below 100.

$50\% + 34\% = 84\%$

Since 84% of the data values are below 112, you know that 112 falls at the 84^th percentile.

84^th

84th

45.

0.583

If x is a member of a normally distributed dataset with mean µ and standard deviation σ, then the standardized score for x is

z = \frac{{x - \mu }}{\sigma } = \frac{{107 - 100}}{{12}} \approx 0.583

.

46.

71.2

If you know a z-score,

x = \mu + z \times \sigma = 100 + \left( { - 2.4} \right)\left( {12} \right) = 100 - 28.8 = 71.2

.

47.

72nd

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Using your data: =NORM.DIST(107, 100, 12,TRUE)

You will see the decimal version of the percentile: 0.7201655364

Convert the decimal to an approximate percentage: the 72^nd percentile

48.

115.38

When your data is normally distributed and you know the percentile, mean, and standard deviation, you can find the data value with the NORM.INV function.

NORM.INV(percentile as a decimal, mean, standard deviation)

In an empty cell in Google Sheets: =NORM.INV(0.90, 100, 12)

You see: 115.3786188

The approximate value: 115.38

49.

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Using your data: =NORM.DIST(940, 1060, 195,TRUE)

You will see the decimal version of the percentile: 0.2691503745

Convert the decimal to an approximate percentage: the 27^th percentile

27th

50.

When your data is normally distributed and you know the percentile, mean, and standard deviation, you can find the data value with the NORM.INV function.

NORM.INV(percentile as a decimal, mean, standard deviation)

Convert the 67^th percentile to 0.67.

In an empty cell in Google Sheets: =NORM.INV(0.67, 21, 5)

You see: 23.19956583

The approximate value: 23.2

23.2

51.

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Find the percentile for the 1300 on the SAT.

Using your data: =NORM.DIST(1300, 1060, 195,TRUE)

You will see the decimal version of the percentile: 0.8907954067

Convert the decimal to an approximate percentage: the 89.1^st percentile

Find the percentile for the 27 on the ACT.

In an empty cell in Google Sheets: =NORM.DIST(27, 21, 5,TRUE)

You will see the decimal version of the percentile: 0.8849303298

Convert the decimal to an approximate percentage: the 88.4^th percentile

The 1300 is at the 89.1st percentile while the 27 is at the 88.4^th percentile. The 1300 on the SAT is the better score.

1300 on the SAT

52.

Since the number of heads is distributed normally, it can be shown that the mean of n flips is $\frac{n}{2}$ and the standard deviation is $\frac{{\sqrt n }}{2}$ as long as $n \geq {\text{ }}20$ .

Since n is 120, the mean is $\frac{n}{2} = \frac{{120}}{2} = 60$ .

The standard deviation is $\frac{{\sqrt n }}{2} = \frac{{\sqrt {120} }}{2} \approx 5.477$ .

\mu = 60

;

\sigma = \frac{1}{2}\sqrt {120} \approx 5.477

53.

Since the number of heads is distributed normally, it can be shown that the mean of n flips is $\frac{n}{2}$ and the standard deviation is $\frac{{\sqrt n }}{2}$ as long as $n \geq {\text{ }}20$ .

Since n is 120, the mean is $\frac{n}{2} = \frac{{120}}{2} = 60$ .

The standard deviation is $\frac{{\sqrt n }}{2} = \frac{{\sqrt {120} }}{2} \approx 5.477$ .

Use your mean and standard deviation and the technology of Google Sheets.

In an empty cell of Google Sheets, use the function NORM.DIST.

NORM.DIST(your data value, your mean, your standard deviation, TRUE)

Using your data: =NORM.DIST(70, 60, 5.477,TRUE)

You will see the decimal version of the percentile: 0.9660610881

Convert the decimal to an approximate percentage: the 96.6^th percentile

96.6th

54.

Since the number of heads is distributed normally, it can be shown that the mean of n flips is $\frac{n}{2}$ and the standard deviation is $\frac{{\sqrt n }}{2}$ as long as $n \geq {\text{ }}20$ .

Since n is 120, the mean is $\frac{n}{2} = \frac{{120}}{2} = 60$ .

The standard deviation is $\frac{{\sqrt n }}{2} = \frac{{\sqrt {120} }}{2} \approx 5.477$ .

Use your mean and standard deviation and the technology of Google Sheets.

When your data is normally distributed and you know the percentile, mean, and standard deviation, you can find the data value with the NORM.INV function.

NORM.INV(percentile as a decimal, mean, standard deviation)

In an empty cell in Google Sheets: =NORM.INV(0.30, 60, 5.477)

What you see: 57.12785839

Rounded version: 57

57

55.

Quartiles break the data into fourths. You want the score at the 75^th percentile.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the World Tax tab.

Look for the 2017 column which holds the tax revenue.

Since 2017 is a number, change the header to “Tax Revenue for 2017.”

USING PERCENTILE:

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of what used to be the 2017 column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.75)

The complete function probably looks like this: =PERCENTILE(B:B,0.75)

The tax revenue you see is 20.7689625 or approximately 20.77%.

USING NORM.INV:

To use this function, you need to know the mean and standard deviation.

The function to find the mean: =AVERAGE(B:B,0.5)

The mean is approximately 16.52854749

The function to find the standard deviation: =STDEV(B:B)

The standard deviation is approximately 6.172307157.

When your data is normally distributed and you know the percentile, mean, and standard deviation, you can find the data value with the NORM.INV function.

NORM.INV(percentile as a decimal, mean, standard deviation)

You want the value at the 75^th percentile, so the first argument is 0.75.

In an empty cell in Google Sheets: =NORMINV(0.75, 16.52854749, 6.172307157)

You see: 20.6917054

The approximate value: 20.69%

The PERCENTILE method gives you 20.77%. (If you forgot and left 2017 as the header, you got 20.86.)

The NORM.INV method gave you around 20.69%. Your answer could vary a bit depending on how many decimal places you kept in the mean and standard deviation.

PERCENTILE gives 20.86; NORM.INV gives 21.11.

56.

Open a copy of the file in Google Sheets. You cannot edit the original file.

Look for the World Tax tab.

Look for the 2017 column which holds the tax revenue.

Since 2017 is a number, change the header to “Tax Revenue for 2017” (otherwise you would do calculations with 2017 and treat it as one of the data values).

USING PERCENTILE:

In a blank cell, enter the function: =PERCENTILE(

Click on the letter at the top of what used to be the 2017 column.

Then enter a comma, the percentile as a decimal and close the parentheses: ,0.20)

The complete function probably looks like this: =PERCENTILE(B:B,0.2)

The tax revenue you see is 11.4804963or approximately 11.48%.

USING NORM.INV:

To use this function, you need to know the mean and standard deviation.

The function to find the mean: =AVERAGE(B:B)

The mean is approximately 16.52854749

The function to find the standard deviation: =STDEV(B:B) (remove the header temporarily)

The standard deviation is approximately 6.172307157

When your data is normally distributed and you know the percentile, mean, and standard deviation, you can find the data value with the NORM.INV function.

NORM.INV(percentile as a decimal, mean, standard deviation)

You want the value at the 75^th percentile, so the first argument is 0.75.

In an empty cell in Google Sheets: =NORMINV(0.20, 16.52854749, 6.172307157)

You see: 11.33380273

The approximate value: 11.33%

The PERCENTILE method gives you 11.48%.

The NORM.INV method gave you around 11.33%. Your answer could vary a bit depending on how many decimal places you kept in the mean and standard deviation.

PERCENTILE gives 11.82; NORM.INV gives 11.68.

57.

PERCENTRANK gives 92.6th; NORM.DIST gives 91.9th.

58.

PERCENTRANK gives 77th; NORM.DIST gives 79.2nd.

59.

A scatter plot is a visualization of the relationship between two sets of data. You turn the dataset into ordered pairs. The first coordinate is from the explanatory dataset and the second coordinate contains the corresponding value from the response dataset. Plot these points in the xy-plane.

Create your (x, y) coordinate pairs and plot the points. For instance, the first point is (20, 13). The next is (11, 15). Continue until you plot (25, 10).

A scatter plot shows five points. The x-axis ranges from 5 to 30, in increments of 5. The y-axis ranges from 0 to 20, in increments of 5. The points are as follows: (8, 17), (11, 15), (20, 13), (22, 13), and (25, 10). Note: all values are approximate.

60.

Yes

a. Yes, this is curved.

61.

No
Weak positive relationship; $r \approx 0.5$

a. No, this is not curved.

b. This has a weak positive relationship, with a possible r value of around 0.5.

The correlation coefficient (r) denotes the strength and direction of a linear relationship. Positive values near 1 indicate that as one variable increases, the other does. The closer r is to −1, the more likely that as one variable increases, the other decreases. If r is zero, there is no relationship. A value of plus or minus 0.7 is considered the dividing line between strong and weak relationships. For this relationship, your guess could have been anywhere between 0.4 and 0.6. To know r for certain, you would need to do calculations using the coordinates of the points.

62.

0.96

63.

y = 2900 x\, –\, 478

64.

You are given the regression equation, $y = {\text{ }}0.161x + {\text{ }}2645$

$y = {\text{ }}0.161\left( {30,000} \right){\text{ }} + {\text{ }}2645{\text{ }} = {\text{ }}\$ 7,475$

The predicted average monthly faculty salary is $7,475.

$7,475

65.

You are given the regression equation, $y = {\text{ }}0.161x + {\text{ }}2645$

Substitute $34,880 for x and compare the result to $6,765.

$y = {\text{ }}0.161\left( {34,880} \right){\text{ }} + {\text{ }}2,645{\text{ }} = {\text{ }}\$ 8.260.68$

$8,260.68{\text{ }} - {\text{ }}6,765{\text{ }} = {\text{ }}1,495.68$

The actual $6,765 salary is $1,495.68 less than the predicted salary.

Less than expected by $1,495.68

66.

The regression equation is $y = {\text{ }}0.161x + {\text{ }}2645$

The slope is 0.161.

This tells you that for every $1,000 increase in out-of-state tuition, you expect the average monthly faculty salary to increase by $161.

For every $1,000 increase in out-of-state tuition, we expect average monthly salary to increase by $161.

Chapter 8

Your Turn

Check Your Understanding