Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Statistics

Practice

StatisticsPractice

1.1 Definitions of Statistics, Probability, and Key Terms

1.

Below is a two-way table showing the types of college sports played by men and women.

Soccer Basketball Lacrosse Total
Women 8 8 4 20
Men 4 12 4 20
Total 12 20 8 40
Table 1.28

Given these data, calculate the marginal distributions of college sports for the people surveyed.

2.

Below is a two-way table showing the types of college sports played by men and women.

Soccer Basketball Lacrosse Total
Women 8 8 4 20
Men 4 12 4 20
Total 12 20 8 40
Table 1.29

Given these data, calculate the conditional distributions for the subpopulation of women who play college sports.

Use the following information to answer the next five exercises. Studies are often done by pharmaceutical companies to determine the effectiveness of a treatment program. Suppose that a new viral antibody drug is currently under study. It is given to patients once the virus's symptoms have revealed themselves. Of interest is the average (mean) length of time in months patients live once they start the treatment. Two researchers each follow a different set of 40 patients with the viral disease from the start of treatment until their deaths. The following data (in months) are collected.

Researcher A3; 4; 11; 15; 16; 17; 22; 44; 37; 16; 14; 24; 25; 15; 26; 27; 33; 29; 35; 44; 13; 21; 22; 10; 12; 8; 40; 32; 26; 27; 31; 34; 29; 17; 8; 24; 18; 47; 33; 34

Researcher B3; 14; 11; 5; 16; 17; 28; 41; 31; 18; 14; 14; 26; 25; 21; 22; 31; 2; 35; 44; 23; 21; 21; 16; 12; 18; 41; 22; 16; 25; 33; 34; 29; 13; 18; 24; 23; 42; 33; 29

Determine what the key terms refer to in the example for Researcher A.

3.

population

4.

sample

5.

parameter

6.

statistic

7.

variable

1.2 Data, Sampling, and Variation in Data and Sampling

8.

Number of times per week is what type of data?

a. qualitative (categorical); b. quantitative discrete; c. quantitative continuous

Use the following information to answer the next four exercises: A study was done to determine the age, number of times per week, and the duration (amount of time) of residents using a local park in San Antonio, Texas. The first house in the neighborhood around the park was selected randomly, and then the resident of every eighth house in the neighborhood around the park was interviewed.

9.

The sampling method was

a. simple random; b. systematic; c. stratified; d. cluster

10.

Duration (amount of time) is what type of data?

a. qualitative (categorical); b. quantitative discrete; c. quantitative continuous

11.

The colors of the houses around the park are what kind of data?

a. qualitative (categorical); b. quantitative discrete; c. quantitative continuous

12.

The population is ________.

13.

Table 1.30 contains the total number of deaths worldwide as a result of earthquakes from 2000–2012.

YearTotal Number of Deaths
2000 231
2001 21,357
2002 11,685
2003 33,819
2004 228,802
2005 88,003
2006 6,605
2007 712
2008 88,011
2009 1,790
2010 320,120
2011 21,953
2012 768
Total 823,856
Table 1.30

Use Table 1.30 to answer the following questions.

  1. What is the proportion of deaths between 2007–2012?
  2. What percent of deaths occurred before 2001?
  3. What is the percent of deaths that occurred in 2003 or after 2010?
  4. What is the fraction of deaths that happened before 2012?
  5. What kind of data is the number of deaths?
  6. Earthquakes are quantified according to the amount of energy they produce (examples are 2.1, 5.0, 6.7). What type of data is that?
  7. What contributed to the large number of deaths in 2010? In 2004? Explain.
  8. If you were asked to present these data in an oral presentation, what type of graph would you choose to present and why? Explain what features you would point out on the graph during your presentation.

For the following four exercises, determine the type of sampling used (simple random, stratified, systematic, cluster, or convenience).

14.

A group of test subjects is divided into twelve groups; then four of the groups are chosen at random.

15.

A market researcher polls every tenth person who walks into a store.

16.

The first 50 people who walk into a sporting event are polled on their television preferences.

17.

A computer generates 100 random numbers, and 100 people whose names correspond with the numbers on the list are chosen.


Use the following information to answer the next seven exercises: Studies are often done by pharmaceutical companies to determine the effectiveness of a treatment program. Suppose that a new viral antibody drug is currently under study. It is given to patients once the virus's symptoms have revealed themselves. Of interest is the average (mean) length of time in months patients live once starting the treatment. Two researchers each follow a different set of 40 patients with the viral disease from the start of treatment until their deaths. The following data (in months) are collected:

Researcher A: 3; 4; 11; 15; 16; 17; 22; 44; 37; 16; 14; 24; 25; 15; 26; 27; 33; 29; 35; 44; 13; 21; 22; 10; 12; 8; 40; 32; 26; 27; 31; 34; 29; 17; 8; 24; 18; 47; 33; 34

Researcher B: 3; 14; 11; 5; 16; 17; 28; 41; 31; 18; 14; 14; 26; 25; 21; 22; 31; 2; 35; 44; 23; 21; 21; 16; 12; 18; 41; 22; 16; 25; 33; 34; 29; 13; 18; 24; 23; 42; 33; 29

18.

Complete the tables using the data provided.

Survival Length (in months) Frequency Relative Frequency Cumulative Relative Frequency
.5–6.5
6.5–12.5
12.5–18.5
18.5–24.5
24.5–30.5
30.5–36.5
36.5–42.5
42.5–48.5
Table 1.31 Researcher A
Survival Length (in months) Frequency Relative Frequency Cumulative Relative Frequency
.5–6.5
6.5–12.5
12.5–18.5
18.5–24.5
24.5–30.5
30.5–36.5
36.5-45.5
Table 1.32 Researcher B
19.

Determine what the key term data refers to in the above example for Researcher A.

20.

List two reasons why the data may differ.

21.

Can you tell if one researcher is correct and the other one is incorrect? Why?

22.

Would you expect the data to be identical? Why or why not?

23.

Suggest at least two methods the researchers might use to gather random data.

24.

Suppose that the first researcher conducted his survey by randomly choosing one state in the nation and then randomly picking 40 patients from that state. What sampling method would that researcher have used?

25.

Suppose that the second researcher conducted his survey by choosing 40 patients he knew. What sampling method would that researcher have used? What concerns would you have about this data set, based upon the data collection method?

Use the following data to answer the next five exercises: Two researchers are gathering data on hours of video games played by school-aged children and young adults. They each randomly sample different groups of 150 students from the same school. They collect the following data:

Hours Played per Week Frequency Relative Frequency Cumulative Relative Frequency
0–2 26 .17 .17
2–4 30 .20 .37
4–6 49 .33 .70
6–8 25 .17 .87
8–10 12 .08 .95
10–12 8 .05 1
Table 1.33 Researcher A
Hours Played per Week Frequency Relative Frequency Cumulative Relative Frequency
0–2 48 .32 .32
2–4 51 .34 .66
4–6 24 .16 .82
6–8 12 .08 .90
8–10 11 .07 .97
10–12 4 .03 1
Table 1.34 Researcher B
26.

Give a reason why the data may differ.

27.

Would the sample size be large enough if the population is the students in the school?

28.

Would the sample size be large enough if the population is school-aged children and young adults in the United States?

29.

Researcher A concludes that most students play video games between four and six hours each week. Researcher B concludes that most students play video games between two and four hours each week. Who is correct?

30.

Suppose you were asked to present the data from researchers A and B in an oral presentation. When would a pie graph be appropriate? When would a bar graph more desirable? Explain which features you would point out on each type of graph and what potential display problems you would try to avoid.

31.

As part of a way to reward students for participating in the survey, the researchers gave each student a gift card to a video game store. Would this affect the data if students knew about the award before the study?

Use the following data to answer the next five exercises: A pair of studies was performed to measure the effectiveness of a new software program designed to help stroke patients regain their problem-solving skills. Patients were asked to use the software program twice a day, once in the morning, and once in the evening. The studies observed 200 stroke patients recovering over a period of several weeks. The first study collected the data in Table 1.35. The second study collected the data in Table 1.36.

Group Showed Improvement No Improvement Deterioration
Used program 142 43 15
Did not use program 72 110 18
Table 1.35
Group Showed Improvement No Improvement Deterioration
Used program 105 74 19
Did not use program 89 99 12
Table 1.36
32.

Given what you know, which study is correct?

33.

The first study was performed by the company that designed the software program. The second study was performed by the American Medical Association. Which study is more reliable?

34.

Both groups that performed the study concluded that the software works. Is this accurate?

35.

The company takes the two studies as proof that their software causes mental improvement in stroke patients. Is this a fair statement?

36.

Patients who used the software were also a part of an exercise program whereas patients who did not use the software were not. Does this change the validity of the conclusions from Exercise 1.34?

37.

Is a sample size of 1,000 a reliable measure for a population of 5,000?

38.

Is a sample of 500 volunteers a reliable measure for a population of 2,500?

39.

A question on a survey reads: "Do you prefer the delicious taste of Brand X or the taste of Brand Y?" Is this a fair question?

40.

Is a sample size of two representative of a population of five?

41.

Is it possible for two experiments to be well run with similar sample sizes to get different data?

1.3 Frequency, Frequency Tables, and Levels of Measurement

42.

What type of measure scale is being used? Nominal, ordinal, interval or ratio.

  1. High school soccer players classified by their athletic ability: superior, average, above average
  2. Baking temperatures for various main dishes: 350, 400, 325, 250, 300
  3. The colors of crayons in a 24-crayon box
  4. Social security numbers
  5. Incomes measured in dollars
  6. A satisfaction survey of a social website by number: 1 = very satisfied, 2 = somewhat satisfied, 3 = not satisfied
  7. Preferred TV shows: comedy, drama, science fiction, sports, news
  8. Time of day on an analog watch
  9. The distance in miles to the closest grocery store
  10. The dates 1066, 1492, 1644, 1947, and 1944
  11. The heights of 21–65-year-old women
  12. Common letter grades: A, B, C, D, and F

1.4 Experimental Design and Ethics

43.

Design an experiment. Identify the explanatory and response variables. Describe the population being studied and the experimental units. Explain the treatments that will be used and how they will be assigned to the experimental units. Describe how blinding and placebos may be used to counter the power of suggestion.

44.

Discuss potential violations of the rule requiring informed consent.

  1. Inmates in a correctional facility are offered good behavior credit in return for participation in a study.
  2. A research study is designed to investigate a new children’s allergy medication.
  3. Participants in a study are told that the new medication being tested is highly promising, but they are not told that only a small portion of participants will receive the new medication. Others will receive placebo treatments and traditional treatments.
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/statistics/pages/1-introduction
Citation information

© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.