Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Introductory Statistics 2e

4.4 Geometric Distribution

Introductory Statistics 2e4.4 Geometric Distribution

There are four main characteristics of a geometric experiment.

  1. A trial is repeated until a success occurs. Think of this as one or more Bernoulli trials with all failures except the last one, which is a success. In other words, you keep repeating what you are doing until the first success. Then you stop. For example, you throw a dart at a bullseye until you hit the bullseye. The first time you hit the bullseye is a "success," so you stop throwing the dart. It might take six tries until you hit the bullseye. You can think of the trials as failure, failure, failure, failure, failure, success, STOP. In theory, the number of trials could go on forever.
  2. The repeated trials are independent of each other.
  3. The probability, p, of a success and the probability, q, of a failure is the same for each trial. p+q =1p+q =1 and q=1-p.q=1-p. For example, the probability of rolling a three when you throw one fair die is 1/61/6. This is true no matter how many times you roll the die. Suppose you want to know the probability of getting the first three on the fifth roll. On rolls one through four, you do not get a face with a three. The probability for each of the rolls is q=5/6,q=5/6, the probability of a failure. The probability of getting a first three on the fifth roll is (5/6)(5/6)(5/6)(5/6)(1/6) = 0.0804(5/6)(5/6)(5/6)(5/6)(1/6) = 0.0804
  4. The random variable X represents the number of the trial in which the first success occurs. That is, X = the number of independent trials until the first success.

The following are additional attributes of the geometric distribution:

  1. The random variable is discrete. The random variable may be defined in two ways depending upon the analyst’s interest. In the example above for throwing a die, the question was “What is the probability that the first success will be on the fifth throw?” Alternatively, the question could be asked as “What is the probability that it takes four failures before a success?” We will see that each way of asking the question will alter the form of the geometric probability density function slightly and will change the mean and standard deviation of the geometric pdf.
  2. Implicit in the random variable is that the probability of a success is constant and therefore so is the probability of a failure. For flipping a coin this is obvious, but in experiments that require skill, such as hitting a baseball or throwing a dart, one might consider that learning during the experiment would alter the probability of a success. The geometric distribution cannot capture “learning” thus the historical probability of success is assumed to be constant.
  3. The geometric distribution is “memoryless.” There are very few probability density functions that are what is known as “memoryless,” and the Geometric distribution is the only one with a discrete random variable that is memoryless. As an example, historically Major League Baseball player Jones has a record of hitting the ball for at least an advance to first base with a probability of 0.20. Jones has not had a hit in his last 10 times at bat. What is the probability that Jones will get a hit in his third time at bat? The answer ignores his 10 previous failures. All events prior to the events in current time are irrelevant and thus are considered “memoryless.”

Formally: Px=n+k|xk+1=P(x=n),Px=n+k|xk+1=P(x=n), where k = number of previous failures

Jones’s probability of a hit begins anew each time he comes to bat. This feature of the geometric distribution results in a curious result: Drawing parts from a manufacturing process to test for parts that are defective, the geometric distribution begins with a clean slate each time the tests begin with no consideration of previous test results. More on this when we get to the exponential probability density function.

Example 4.17

You play a game of chance that you can either win or lose (there are no other possibilities) until you lose. Your probability of losing is p = 0.57. What is the probability that it takes five games until you lose? Let X = the number of games you play until you lose (includes the losing game). Then X takes on the values 1, 2, 3, ... (could go on indefinitely). The probability question is P(x = 5).

Try It 4.17

You throw darts at a board until you hit the center area. Your probability of hitting the center area is p = 0.17. You want to find the probability that it takes eight throws until you hit the center. What values does X take on?

Example 4.18

Problem

A safety engineer feels that 35% of all industrial accidents in the plant are caused by failure of employees to follow instructions. They decide to look at the accident reports (selected randomly and replaced in the pile after reading) until they find one that shows an accident caused by failure of employees to follow instructions. On average, how many reports would the safety engineer expect to look at until they find a report showing an accident caused by employee failure to follow instructions? What is the probability that the safety engineer will have to examine at least three reports until they find a report showing an accident caused by employee failure to follow instructions?

Let X = the number of accidents the safety engineer must examine until they find a report showing an accident caused by employee failure to follow instructions. X takes on the values 1, 2, 3, .... The first question asks you to find the expected value or the mean. The second question asks you to find P(x ≥ 3). ("At least" translates to a "greater than or equal to" symbol).

Try It 4.18

An instructor feels that 15% of students get below a C on their final exam. They decide to look at final exams (selected randomly and replaced in the pile after reading) until they find one that shows a grade below a C. We want to know the probability that the instructor will have to examine at least ten exams until they find one with a grade below a C. What is the probability question stated mathematically?

Example 4.19

Suppose that you are looking for a student at your college who lives within five miles of you. You know that 55% of the 25,000 students do live within five miles of you. You randomly contact students from the college until one says they live within five miles of you. What is the probability that you need to contact four people?

This is a geometric problem because you may have a number of failures before you have the one success you desire. Also, the probability of a success stays the same each time you ask a student if they live within five miles of you. There is no definite number of trials (number of times you ask a student).

Problem

a. Let X = the number of ____________ you must ask ____________ one says yes.

b. What values does X take on?

c. What are p and q?

d. The probability question is P(_______).

Try It 4.19

You need to find a store that carries a special printer ink. You know that of the stores that carry printer ink, 10% of them carry the special ink. You randomly call each store until one has the ink you need. What are p and q?

Notation for the Geometric: G = Geometric Probability Distribution Function

X ~ G(p)

Read this as "X is a random variable with a geometric distribution." The parameter is p; p = the probability of a success for each trial.

CASE I: Random Variable X Is Event of First Success

In this case we ask, “What is the probability that we will have some number x of events of interest to us of failures before a success?”

The geometric pdf tells us the probability that the first occurrence of success requires x number of failure independent trials, each with probability (1-p)(1-p). If the probability of success on each trial is p, then the probability that the xth trial (out of x trials) is the first success is:

PX=x=(1-p)x-1pPX=x=(1-p)x-1p

for x=1, 2, 3, ....x=1, 2, 3, ....

Like the binomial distribution, the geometric distribution has the parameters of the mean and standard deviation. The expected value of X, the mean for Case I, is μ=1p.μ=1p. This tells us how many failed trials to expect until we get the first success. This count includes in the count of trials the trial that results in success. The above form of the geometric distribution is used for modeling the number of trials until the first success. The number of trials includes the one that is a success: x = all trials including the one that is a success. This can be seen in the form of the formula. If X = number of trials including the success, then we must multiply the probability of failure, (1-p)(1-p), times the number of failures, that is x-1x-1. The standard deviation of Case I of the geometric distribution is:

σ=1p1p-1σ=1p1p-1

CASE II: Random Variable X Is Number of Failures BEFORE a Success

By contrast to Case I, the following form of the geometric distribution used for modeling number of failures until the first success is:

PX=x=(1-p)xpPX=x=(1-p)xp

for x=0, 1, 2, 3, ....x=0, 1, 2, 3, ....

In this case the trial that is the success is not counted as a trial in the formula: x = number of failures. The expected value, the mean, of this distribution is μ=1-ppμ=1-pp. This tells us how many failures to expect before we have a success. In either case, the sequence of probabilities is a geometric sequence. In Case II, the standard deviation parameter is: σ=1-pp.σ=1-pp.

Bar graph of probability distribution of the given data.
Figure 4.4
GEOMETRIC:P = .02,  Common ratio = r = .98GEOMETRIC:P = .02,  Common ratio = r = .98

The y-axis in Figure 4.4 contains the probability of x, and the x-axis is the random number components tested. For example, at x=1x=1 the probability it will be found to be defective is 0.0196. With two components tested, the probability the second component is defective is graphed at a probability of 0.0196 at x=2x=2 on the x axis. For the probability that the third component is defective we find PX=3=0.019208.PX=3=0.019208. (The first two are the same because of rounding in the computations.)

Notice on Figure 4.4 that the probabilities decline by the same step down with each change in the value of x. This increment is called the common ratio. This exists for the geometric probability distribution uniquely. The common ratio, called r, can be calculated by dividing any value by the previous value, e.g., P(x=5)P(x=4)=0.0184470.018823=0.98.P(x=5)P(x=4)=0.0184470.018823=0.98.

For this set of data, therefore, the common ratio is 0.98. The common ratio then multiplied by any other probability value will provide the next probability value in the sequence. For example, the probability that the sixth component tested is a failure is 0.018078. Check this using the formula from Case I. Now we have the Px=6Px=6. Knowing this and the common ratio we can calculate the Px=7Px=7 by simple multiplication: Px=7=0.018078 * 0.98=0.017716,Px=7=0.018078 * 0.98=0.017716,, the same value we found earlier by using the geometric probability distribution. This common ratio increment is the same ratio between every number and is called a geometric progression and thus the name for this probability density function. Once the common ratio is calculated, any Px=xaPx=xa one desires to know can be easily found.

The number of components that you would expect to test until you find the first defective component is the mean, μ=50μ=50 for this case of defective components. The formula for the mean of the geometric distribution for the random variable defined as number of failures before first success is μ=1p=10.02=50μ=1p=10.02=50

See Example 4.20 for an example where the geometric random variable is defined as number of trials until first success. The expected value of this formula for the geometric distribution will be different from this version of the distribution. Case II also has a variance, but this is changed from the Case I formula. This formula for the variance is:

PX=x=(1-p)x-1pPX=x=(1-p)x-1p

for x = 1, 2, 3, ....x = 1, 2, 3, ....

The formula for the variance is
σ2=1p1p-1=10.0210.02-1=2,450σ2=1p1p-1=10.0210.02-1=2,450
The standard deviation is
σ=1p1p-1=10.0210.02-1=49.5σ=1p1p-1=10.0210.02-1=49.5

Example 4.20

Assume that the probability of a defective computer component is 0.02. Components are randomly selected. Find the probability that the first defect is caused by the seventh component tested. How many components do you expect to test until one is found to be defective?

Let X = the number of computer components tested until the first defect is found.

X takes on the values 1, 2, 3, ... where p = 0.02.

Find P(x = 7). P(x = 7) = 0.0177.

Using the TI-83, 83+, 84, 84+ Calculator

To find the probability that x = 7,

  • Enter 2nd, DISTR
  • Scroll down and select geometpdf(
  • Press ENTER
  • Enter 0.02, 7); press ENTER to see the result: P(x = 7) = 0.0177

To find the probability that x ≤ 7, follow the same instructions EXCEPT select E:geometcdf(as the distribution function.

The probability that the seventh component is the first defect is 0.0177.

The formula for the variance is σ2 = ( 1 p )( 1 p 1 ) ( 1 p )( 1 p 1 ) = ( 1 0.02 )( 1 0.02 1 ) ( 1 0.02 )( 1 0.02 1 ) = 2,450

The standard deviation is σ = ( 1 p )( 1 p 1 ) ( 1 p )( 1 p 1 ) = ( 1 0.02 )( 1 0.02 1 ) ( 1 0.02 )( 1 0.02 1 ) = 49.5

Try It 4.20

The probability of a defective steel rod is 0.01. Steel rods are selected at random. Find the probability that the first defect occurs on the ninth steel rod. Use the TI-83+ or TI-84 calculator to find the answer.

Example 4.21

Problem

The lifetime risk of developing cancer is about one in 67 (1.5%). Let X = the number of people you ask until one says they have cancer. Then X is a discrete random variable with a geometric distribution: X~G167X~G167 or X~G(0.015)X~G(0.015)

  1. What is the probability of that you ask ten people before one says they have cancer?
  2. What is the probability that you must ask 20 people?
  3. Find the (i) mean and (ii) standard deviation of X.

Try It 4.21

The literacy rate for a nation measures the proportion of people age 15 and over who can read and write. The literacy rate for women in Afghanistan is 12%. Let X = the number of Afghani women you ask until one says that she is literate.

  1. What is the probability distribution of X?
  2. What is the probability that you ask five women before one says she is literate?
  3. What is the probability that you must ask ten women?
  4. Find the (i) mean and (ii) standard deviation of X.
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
Citation information

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.