Skip to Content
OpenStax Logo
Buy book
  1. Preface
  2. 1 Sampling and Data
    1. Introduction
    2. 1.1 Definitions of Statistics, Probability, and Key Terms
    3. 1.2 Data, Sampling, and Variation in Data and Sampling
    4. 1.3 Levels of Measurement
    5. 1.4 Experimental Design and Ethics
    6. Key Terms
    7. Chapter Review
    8. Homework
    9. References
    10. Solutions
  3. 2 Descriptive Statistics
    1. Introduction
    2. 2.1 Display Data
    3. 2.2 Measures of the Location of the Data
    4. 2.3 Measures of the Center of the Data
    5. 2.4 Sigma Notation and Calculating the Arithmetic Mean
    6. 2.5 Geometric Mean
    7. 2.6 Skewness and the Mean, Median, and Mode
    8. 2.7 Measures of the Spread of the Data
    9. Key Terms
    10. Chapter Review
    11. Formula Review
    12. Practice
    13. Homework
    14. Bringing It Together: Homework
    15. References
    16. Solutions
  4. 3 Probability Topics
    1. Introduction
    2. 3.1 Terminology
    3. 3.2 Independent and Mutually Exclusive Events
    4. 3.3 Two Basic Rules of Probability
    5. 3.4 Contingency Tables and Probability Trees
    6. 3.5 Venn Diagrams
    7. Key Terms
    8. Chapter Review
    9. Formula Review
    10. Practice
    11. Bringing It Together: Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  5. 4 Discrete Random Variables
    1. Introduction
    2. 4.1 Hypergeometric Distribution
    3. 4.2 Binomial Distribution
    4. 4.3 Geometric Distribution
    5. 4.4 Poisson Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  6. 5 Continuous Random Variables
    1. Introduction
    2. 5.1 Properties of Continuous Probability Density Functions
    3. 5.2 The Uniform Distribution
    4. 5.3 The Exponential Distribution
    5. Key Terms
    6. Chapter Review
    7. Formula Review
    8. Practice
    9. Homework
    10. References
    11. Solutions
  7. 6 The Normal Distribution
    1. Introduction
    2. 6.1 The Standard Normal Distribution
    3. 6.2 Using the Normal Distribution
    4. 6.3 Estimating the Binomial with the Normal Distribution
    5. Key Terms
    6. Chapter Review
    7. Formula Review
    8. Practice
    9. Homework
    10. References
    11. Solutions
  8. 7 The Central Limit Theorem
    1. Introduction
    2. 7.1 The Central Limit Theorem for Sample Means
    3. 7.2 Using the Central Limit Theorem
    4. 7.3 The Central Limit Theorem for Proportions
    5. 7.4 Finite Population Correction Factor
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  9. 8 Confidence Intervals
    1. Introduction
    2. 8.1 A Confidence Interval for a Population Standard Deviation, Known or Large Sample Size
    3. 8.2 A Confidence Interval for a Population Standard Deviation Unknown, Small Sample Case
    4. 8.3 A Confidence Interval for A Population Proportion
    5. 8.4 Calculating the Sample Size n: Continuous and Binary Random Variables
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  10. 9 Hypothesis Testing with One Sample
    1. Introduction
    2. 9.1 Null and Alternative Hypotheses
    3. 9.2 Outcomes and the Type I and Type II Errors
    4. 9.3 Distribution Needed for Hypothesis Testing
    5. 9.4 Full Hypothesis Test Examples
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  11. 10 Hypothesis Testing with Two Samples
    1. Introduction
    2. 10.1 Comparing Two Independent Population Means
    3. 10.2 Cohen's Standards for Small, Medium, and Large Effect Sizes
    4. 10.3 Test for Differences in Means: Assuming Equal Population Variances
    5. 10.4 Comparing Two Independent Population Proportions
    6. 10.5 Two Population Means with Known Standard Deviations
    7. 10.6 Matched or Paired Samples
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  12. 11 The Chi-Square Distribution
    1. Introduction
    2. 11.1 Facts About the Chi-Square Distribution
    3. 11.2 Test of a Single Variance
    4. 11.3 Goodness-of-Fit Test
    5. 11.4 Test of Independence
    6. 11.5 Test for Homogeneity
    7. 11.6 Comparison of the Chi-Square Tests
    8. Key Terms
    9. Chapter Review
    10. Formula Review
    11. Practice
    12. Homework
    13. Bringing It Together: Homework
    14. References
    15. Solutions
  13. 12 F Distribution and One-Way ANOVA
    1. Introduction
    2. 12.1 Test of Two Variances
    3. 12.2 One-Way ANOVA
    4. 12.3 The F Distribution and the F-Ratio
    5. 12.4 Facts About the F Distribution
    6. Key Terms
    7. Chapter Review
    8. Formula Review
    9. Practice
    10. Homework
    11. References
    12. Solutions
  14. 13 Linear Regression and Correlation
    1. Introduction
    2. 13.1 The Correlation Coefficient r
    3. 13.2 Testing the Significance of the Correlation Coefficient
    4. 13.3 Linear Equations
    5. 13.4 The Regression Equation
    6. 13.5 Interpretation of Regression Coefficients: Elasticity and Logarithmic Transformation
    7. 13.6 Predicting with a Regression Equation
    8. 13.7 How to Use Microsoft Excel® for Regression Analysis
    9. Key Terms
    10. Chapter Review
    11. Practice
    12. Solutions
  15. A | Statistical Tables
  16. B | Mathematical Phrases, Symbols, and Formulas
  17. Index
This photo shows branch lightening coming from a dark cloud and hitting the ground.
Figure 4.1 You can use probability and discrete random variables to calculate the likelihood of lightning striking the ground five times during a half-hour thunderstorm. (Credit: Leszek Leszczynski)

A student takes a ten-question, true-false quiz. Because the student had such a busy schedule, he or she could not study and guesses randomly at each answer. What is the probability of the student passing the test with at least a 70%?

Small companies might be interested in the number of long-distance phone calls their employees make during the peak time of the day. Suppose the historical average is 20 calls. What is the probability that the employees make more than 20 long-distance phone calls during the peak time?

These two examples illustrate two different types of probability problems involving discrete random variables. Recall that discrete data are data that you can count, that is, the random variable can only take on whole number values. A random variable describes the outcomes of a statistical experiment in words. The values of a random variable can vary with each repetition of an experiment, often called a trial.

Random Variable Notation

The upper case letter X denotes a random variable. Lower case letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.

For example, let X = the number of heads you get when you toss three fair coins. The sample space for the toss of three fair coins is TTT; THH; HTH; HHT; HTT; THT; TTH; HHH. Then, x = 0, 1, 2, 3. X is in words and x is a number. Notice that for this example, the x values are countable outcomes. Because you can count the possible values as whole numbers that X can take on and the outcomes are random (the x values 0, 1, 2, 3), X is a discrete random variable.

Probability Density Functions (PDF) for a Random Variable

A probability density function or probability distribution function has two characteristics:

  1. Each probability is between zero and one, inclusive.
  2. The sum of the probabilities is one.

A probability density function is a mathematical formula that calculates probabilities for specific types of events, what we have been calling experiments. There is a sort of magic to a probability density function (Pdf) partially because the same formula often describes very different types of events. For example, the binomial Pdf will calculate probabilities for flipping coins, yes/no questions on an exam, opinions of voters in an up or down opinion poll, indeed any binary event. Other probability density functions will provide probabilities for the time until a part will fail, when a customer will arrive at the turnpike booth, the number of telephone calls arriving at a central switchboard, the growth rate of a bacterium, and on and on. There are whole families of probability density functions that are used in a wide variety of applications, including medicine, business and finance, physics and engineering, among others.

For our needs here we will concentrate on only a few probability density functions as we develop the tools of inferential statistics.

Counting Formulas and the Combinational Formula

To repeat, the probability of event A , P(A), is simply the number of ways the experiment will result in A, relative to the total number of possible outcomes of the experiment.

As an equation this is:

P(A) =number of ways to get ATotal number of possible outcomesP(A)=number of ways to get ATotal number of possible outcomes

When we looked at the sample space for flipping 3 coins we could easily write the full sample space and thus could easily count the number of events that met our desired result, e.g. x = 1 , where X is the random variable defined as the number of heads.

As we have larger numbers of items in the sample space, such as a full deck of 52 cards, the ability to write out the sample space becomes impossible.

We see that probabilities are nothing more than counting the events in each group we are interested in and dividing by the number of elements in the universe, or sample space. This is easy enough if we are counting sophomores in a Stat class, but in more complicated cases listing all the possible outcomes may take a life time. There are, for example, 36 possible outcomes from throwing just two six-sided dice where the random variable is the sum of the number of spots on the up-facing sides. If there were four dice then the total number of possible outcomes would become 1,296. There are more than 2.5 MILLION possible 5 card poker hands in a standard deck of 52 cards. Obviously keeping track of all these possibilities and counting them to get at a single probability would be tedious at best.

An alternative to listing the complete sample space and counting the number of elements we are interested in, is to skip the step of listing the sample space, and simply figuring out the number of elements in it and doing the appropriate division. If we are after a probability we really do not need to see each and every element in the sample space, we only need to know how many elements are there. Counting formulas were invented to do just this. They tell us the number of unordered subsets of a certain size that can be created from a set of unique elements. By unordered it is meant that, for example, when dealing cards, it does not matter if you got {ace, ace, ace, ace, king} or {king, ace, ace, ace, ace} or {ace, king, ace, ace, ace} and so on. Each of these subsets are the same because they each have 4 aces and one king.

Combinational Formula

nx = nCx =n!x!(n-x)!nx =nCx=n!x!(n-x)!

This is the formula that tells the number of unique unordered subsets of size x that can be created from n unique elements. The formula is read “n combinatorial x”. Sometimes it is read as “n choose x." The exclamation point "!" is called a factorial and tells us to take all the numbers from 1 through the number before the ! and multiply them together thus 4! is 1·2·3·4=24. By definition 0! = 1. The formula is called the Combinatorial Formula. It is also called the Binomial Coefficient, for reasons that will be clear shortly. While this mathematical concept was understood long before 1653, Blaise Pascal is given major credit for his proof that he published in that year. Further, he developed a generalized method of calculating the values for combinatorials known to us as the Pascal Triangle. Pascal was one of the geniuses of an era of extraordinary intellectual advancement which included the work of Galileo, Rene Descartes, Isaac Newton, William Shakespeare and the refinement of the scientific method, the very rationale for the topic of this text.

Let’s find the hard way the total number of combinations of the four aces in a deck of cards if we were going to take them two at a time. The sample space would be:

S={Spade,Heart),(Spade, Diamond),(Spade,Club), (Diamond,Club),(Heart,Diamond),(Heart,Club)}

There are 6 combinations; formally, six unique unordered subsets of size 2 that can be created from 4 unique elements. To use the combinatorial formula we would solve the formula as follows:

42 =4!(4-2)!2!=4·3·2·12·1·2·1 =642=4!(4-2)!2!=4·3·2·12·1·2·1=6

If we wanted to know the number of unique 5 card poker hands that could be created from a 52 card deck we simply compute:

525 525

where 52 is the total number of unique elements from which we are drawing and 5 is the size group we are putting them into.

With the combinatorial formula we can count the number of elements in a sample space without having to write each one of them down, truly a lifetime's work for just the number of 5 card hands from a deck of 52 cards. We can now apply this tool to a very important probability density function, the hypergeometric distribution.

Remember, a probability density function computes probabilities for us. We simply put the appropriate numbers in the formula and we get the probability of specific events. However, for these formulas to work they must be applied only to cases for which they were designed.

Citation/Attribution

Want to cite, share, or modify this book? This book is Creative Commons Attribution License 4.0 and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-business-statistics/pages/1-introduction
Citation information

© Nov 29, 2017 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License 4.0 license. The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.