There are four main characteristics of a geometric experiment.
- A trial is repeated until a success occurs. Think of this as one or more Bernoulli trials with all failures except the last one, which is a success. In other words, you keep repeating what you are doing until the first success. Then you stop. For example, you throw a dart at a bullseye until you hit the bullseye. The first time you hit the bullseye is a "success," so you stop throwing the dart. It might take six tries until you hit the bullseye. You can think of the trials as failure, failure, failure, failure, failure, success, STOP. In theory, the number of trials could go on forever.
- The repeated trials are independent of each other.
- The probability, p, of a success and the probability, q, of a failure is the same for each trial. and For example, the probability of rolling a three when you throw one fair die is . This is true no matter how many times you roll the die. Suppose you want to know the probability of getting the first three on the fifth roll. On rolls one through four, you do not get a face with a three. The probability for each of the rolls is the probability of a failure. The probability of getting a first three on the fifth roll is
- The random variable X represents the number of the trial in which the first success occurs. That is, X = the number of independent trials until the first success.
The following are additional attributes of the geometric distribution:
- The random variable is discrete. The random variable may be defined in two ways depending upon the analyst’s interest. In the example above for throwing a die, the question was “What is the probability that the first success will be on the fifth throw?” Alternatively, the question could be asked as “What is the probability that it takes four failures before a success?” We will see that each way of asking the question will alter the form of the geometric probability density function slightly and will change the mean and standard deviation of the geometric pdf.
- Implicit in the random variable is that the probability of a success is constant and therefore so is the probability of a failure. For flipping a coin this is obvious, but in experiments that require skill, such as hitting a baseball or throwing a dart, one might consider that learning during the experiment would alter the probability of a success. The geometric distribution cannot capture “learning” thus the historical probability of success is assumed to be constant.
- The geometric distribution is “memoryless.” There are very few probability density functions that are what is known as “memoryless,” and the Geometric distribution is the only one with a discrete random variable that is memoryless. As an example, historically Major League Baseball player Jones has a record of hitting the ball for at least an advance to first base with a probability of 0.20. Jones has not had a hit in his last 10 times at bat. What is the probability that Jones will get a hit in his third time at bat? The answer ignores his 10 previous failures. All events prior to the events in current time are irrelevant and thus are considered “memoryless.”
Formally: where k = number of previous failures
Jones’s probability of a hit begins anew each time he comes to bat. This feature of the geometric distribution results in a curious result: Drawing parts from a manufacturing process to test for parts that are defective, the geometric distribution begins with a clean slate each time the tests begin with no consideration of previous test results. More on this when we get to the exponential probability density function.
Example 4.17
You play a game of chance that you can either win or lose (there are no other possibilities) until you lose. Your probability of losing is p = 0.57. What is the probability that it takes five games until you lose? Let X = the number of games you play until you lose (includes the losing game). Then X takes on the values 1, 2, 3, ... (could go on indefinitely). The probability question is P(x = 5).
Try It 4.17
You throw darts at a board until you hit the center area. Your probability of hitting the center area is p = 0.17. You want to find the probability that it takes eight throws until you hit the center. What values does X take on?
Example 4.18
Problem
A safety engineer feels that 35% of all industrial accidents in the plant are caused by failure of employees to follow instructions. They decide to look at the accident reports (selected randomly and replaced in the pile after reading) until they find one that shows an accident caused by failure of employees to follow instructions. On average, how many reports would the safety engineer expect to look at until they find a report showing an accident caused by employee failure to follow instructions? What is the probability that the safety engineer will have to examine at least three reports until they find a report showing an accident caused by employee failure to follow instructions?
Let X = the number of accidents the safety engineer must examine until they find a report showing an accident caused by employee failure to follow instructions. X takes on the values 1, 2, 3, .... The first question asks you to find the expected value or the mean. The second question asks you to find P(x ≥ 3). ("At least" translates to a "greater than or equal to" symbol).
Solution
- On average, how many reports would the safety engineer expect to review until they find a report showing an accident caused by something other than failure to follow instructions? This question asks you to find the expected value or the mean. This is the average number of failures from something other than following instructions. The formula for the mean of this geometric distribution is:
Note that the mean does not need to be a whole number although the random variable is discrete and must be a counting number - What is the probability that the safety engineer will have to examine at least three reports until they find a report showing an accident caused by employee failure to follow instructions?
This question is answered by first defining the random variable. Let X = the number of accidents the safety engineer must examine until they find a report showing an accident caused by employee failure to follow instructions. X can take on the values 1, 2, 3, 4, … ∞. Unlike the binomial distribution with a fixed number of trials, the geometric distribution may have an infinite number of failed trials before a success occurs.
This second question asks you to find . ("At least" translates to a "greater than or equal to" symbol.)
In a binomial distribution, we saw the solution to questions of “more than” or “less than” would be to calculate each probability individually and add from zero to three. If the probability of interest is “greater than or equal to” (≥) the individual probabilities are added from zero to three and then subtracted from one.
Because we cannot add the full range of the geometric distribution random variable because it goes through infinity, an alternative solution is developed. An alternative geometric distribution pair of formulas provides solutions for questions asking for probabilities “more than” and “less than.”
If the question is:
What is the probability it takes MORE THAN n events to get first success?
What is the probability it takes LESS THAN n event for success?
Try It 4.18
An instructor feels that 15% of students get below a C on their final exam. They decide to look at final exams (selected randomly and replaced in the pile after reading) until they find one that shows a grade below a C. We want to know the probability that the instructor will have to examine at least ten exams until they find one with a grade below a C. What is the probability question stated mathematically?
Example 4.19
Suppose that you are looking for a student at your college who lives within five miles of you. You know that 55% of the 25,000 students do live within five miles of you. You randomly contact students from the college until one says they live within five miles of you. What is the probability that you need to contact four people?
This is a geometric problem because you may have a number of failures before you have the one success you desire. Also, the probability of a success stays the same each time you ask a student if they live within five miles of you. There is no definite number of trials (number of times you ask a student).
Problem
a. Let X = the number of ____________ you must ask ____________ one says yes.
b. What values does X take on?
c. What are p and q?
d. The probability question is P(_______).
Solution
a. Let X = the number of students you must ask until one says yes.
b. 1, 2, 3, …, (total number of students)
c. p = 0.55; q = 0.45
d. P(x = 4)
Try It 4.19
You need to find a store that carries a special printer ink. You know that of the stores that carry printer ink, 10% of them carry the special ink. You randomly call each store until one has the ink you need. What are p and q?
Notation for the Geometric: G = Geometric Probability Distribution Function
X ~ G(p)
Read this as "X is a random variable with a geometric distribution." The parameter is p; p = the probability of a success for each trial.
CASE I: Random Variable X Is Event of First Success
In this case we ask, “What is the probability that we will have some number x of events of interest to us of failures before a success?”
The geometric pdf tells us the probability that the first occurrence of success requires x number of failure independent trials, each with probability . If the probability of success on each trial is p, then the probability that the xth trial (out of x trials) is the first success is:
for
Like the binomial distribution, the geometric distribution has the parameters of the mean and standard deviation. The expected value of X, the mean for Case I, is This tells us how many failed trials to expect until we get the first success. This count includes in the count of trials the trial that results in success. The above form of the geometric distribution is used for modeling the number of trials until the first success. The number of trials includes the one that is a success: x = all trials including the one that is a success. This can be seen in the form of the formula. If X = number of trials including the success, then we must multiply the probability of failure, , times the number of failures, that is . The standard deviation of Case I of the geometric distribution is:
CASE II: Random Variable X Is Number of Failures BEFORE a Success
By contrast to Case I, the following form of the geometric distribution used for modeling number of failures until the first success is:
for
In this case the trial that is the success is not counted as a trial in the formula: x = number of failures. The expected value, the mean, of this distribution is . This tells us how many failures to expect before we have a success. In either case, the sequence of probabilities is a geometric sequence. In Case II, the standard deviation parameter is:
The y-axis in Figure 4.4 contains the probability of x, and the x-axis is the random number components tested. For example, at the probability it will be found to be defective is 0.0196. With two components tested, the probability the second component is defective is graphed at a probability of 0.0196 at on the x axis. For the probability that the third component is defective we find (The first two are the same because of rounding in the computations.)
Notice on Figure 4.4 that the probabilities decline by the same step down with each change in the value of x. This increment is called the common ratio. This exists for the geometric probability distribution uniquely. The common ratio, called r, can be calculated by dividing any value by the previous value, e.g.,
For this set of data, therefore, the common ratio is 0.98. The common ratio then multiplied by any other probability value will provide the next probability value in the sequence. For example, the probability that the sixth component tested is a failure is 0.018078. Check this using the formula from Case I. Now we have the . Knowing this and the common ratio we can calculate the by simple multiplication: , the same value we found earlier by using the geometric probability distribution. This common ratio increment is the same ratio between every number and is called a geometric progression and thus the name for this probability density function. Once the common ratio is calculated, any one desires to know can be easily found.
The number of components that you would expect to test until you find the first defective component is the mean, for this case of defective components. The formula for the mean of the geometric distribution for the random variable defined as number of failures before first success is
See Example 4.20 for an example where the geometric random variable is defined as number of trials until first success. The expected value of this formula for the geometric distribution will be different from this version of the distribution. Case II also has a variance, but this is changed from the Case I formula. This formula for the variance is:
for
Example 4.20
Assume that the probability of a defective computer component is 0.02. Components are randomly selected. Find the probability that the first defect is caused by the seventh component tested. How many components do you expect to test until one is found to be defective?
Let X = the number of computer components tested until the first defect is found.
X takes on the values 1, 2, 3, ... where p = 0.02.
Find P(x = 7). P(x = 7) = 0.0177.
Using the TI-83, 83+, 84, 84+ Calculator
To find the probability that x = 7,
- Enter
2nd, DISTR
- Scroll down and select
geometpdf
( - Press
ENTER
- Enter 0.02, 7); press
ENTER
to see the result: P(x = 7) = 0.0177
To find the probability that x ≤ 7, follow the same instructions EXCEPT select E:geometcdf(as the distribution function.
The probability that the seventh component is the first defect is 0.0177.
The formula for the variance is σ2 = = = 2,450
The standard deviation is σ = = = 49.5
Try It 4.20
The probability of a defective steel rod is 0.01. Steel rods are selected at random. Find the probability that the first defect occurs on the ninth steel rod. Use the TI-83+ or TI-84 calculator to find the answer.
Example 4.21
Problem
The lifetime risk of developing cancer is about one in 67 (1.5%). Let X = the number of people you ask until one says they have cancer. Then X is a discrete random variable with a geometric distribution: or
- What is the probability of that you ask ten people before one says they have cancer?
- What is the probability that you must ask 20 people?
- Find the (i) mean and (ii) standard deviation of X.
Solution
Try It 4.21
The literacy rate for a nation measures the proportion of people age 15 and over who can read and write. The literacy rate for women in Afghanistan is 12%. Let X = the number of Afghani women you ask until one says that she is literate.
- What is the probability distribution of X?
- What is the probability that you ask five women before one says she is literate?
- What is the probability that you must ask ten women?
- Find the (i) mean and (ii) standard deviation of X.