- Bayes’ Theorem
- a method used to calculate a conditional probability when additional information is obtained to refine a probability estimate
- binomial distribution
- a probability distribution for a discrete random variable that is the number of successes in n independent trials. Each of these trials has two possible outcomes, success or failure, with the probability of success in each trial the same.
- central tendency
- the statistical measure that represents the center of a dataset
- coefficient of variation (CV)
- measures the variation of a dataset by calculating the standard deviation as a percentage of the mean
- complement of an event
- the set of all outcomes in the sample space that are not included in the event
- conditional probability
- the probability of an event given that another event has already occurred
- continuous probability distribution
- probability distribution that deals with random variables that can take on any value within a given range or interval
- continuous random variable
- a random variable where there is an infinite number of values that the variable can take on
- dependent events
- events where the probability of occurrence of one event is affected by the occurrence of another event
- descriptive statistics
- the organization, summarization, and graphical display of data
- discrete probability distribution
- probability distribution associated with variables that take on a finite or countably infinite number of distinct values
- discrete random variable
- a random variable where there is only a finite or countable infinite number of values that the variable can take on
- empirical probability
- a probability that is calculated based on data that has been collected from an experiment
- empirical rule
- a rule that provides the percentages of data values falling within one, two, and three standard deviations from the mean for a bell-shaped (normal) distribution
- event
- a subset of the sample space
- frequency
- a count of the number of times that an event or observation occurs in an experiment or study
- frequency distribution
- a method of organizing and summarizing a dataset that provides the frequency with which each value in the dataset occurs
- independent events
- events where the probability of occurrence of one event is not affected by the occurrence of another event
- interquartile range (IQR)
- a number that indicates the spread of the middle half, or middle 50%, of the data; the difference between the third quartile () and the first quartile ()
- mean (also called arithmetic mean)
- a measure of center of a dataset, calculated by adding up the data values and dividing the sum by the number of data values; average
- median
- the middle value in an ordered dataset
- mode
- the most frequently occurring data value in a dataset
- mutually exclusive events
- events that cannot occur at the same time
- normal distribution
- a bell-shaped distribution curve that is used to model many measurements, including IQ scores, salaries, heights, weights, blood pressures, etc.
- outcome
- the result of a single trial in a probability experiment
- outliers
- data values that are significantly different from the other data values in a dataset
- percentiles
- numbers that divide an ordered dataset into hundredths; used to describe the relative standing of a particular value within a dataset by indicating the percentage of data points that fall below it
- Poisson distribution
- a probability distribution for discrete random variables used to calculate probabilities for a certain number of occurrences in a specific interval
- population data
- data representing all the outcomes or measurements that are of interest
- population mean
- the average for all measurements of interest corresponding to the entire group under study
- population size
- the number of measurements for the entire group under study
- probability
- a numerical measure that assesses the likelihood of occurrence of an event
- probability analysis
- provides the tools to model, understand, and quantify uncertainties, allowing data scientists to make informed decisions from data
- probability density function (PDF)
- a function that is used to describe the probability distribution of a continuous random variable
- probability distribution
- a mathematical function that assigns probabilities to various outcomes
- probability mass function (PMF)
- a function that is used to define the probability distribution of a discrete random variable
- quartiles
- numbers that divide an ordered dataset into quarters; the second quartile is the same as the median
- random variable
- a variable where a single numerical value is assigned to a specific outcome from an experiment
- range
- a measure of dispersion for a dataset calculated by subtracting the minimum from the maximum of the dataset
- relative frequency probability
- a method of determining the likelihood of an event occurring based on the observed frequency of its occurrence in a given sample or population
- sample data
- data representing outcomes or measurements collected from a subset or part of a population
- sample mean
- the average for a subset of the measurements of interest
- sample size
- the number of measurements for the subset taken from the overall population
- sample space
- the set of all possible outcomes in a probability experiment
- standard deviation
- a measure of the spread of a dataset, given in the same units as the data, that indicates how far a typical data value is from the mean
- standard normal distribution
- a normal distribution with mean of 0 and standard deviation of 1
- statistical analysis
- the science of collecting, organizing, and interpreting data to make decisions
- theoretical probability
- a probability that is calculated based on an assessment of equally likely outcomes
- trimmed mean
- a calculation for the average or mean of a dataset where some percentage of data values are removed from the lower and upper end of the dataset; typically used to mitigate the effects of outliers on the mean
- variance
- the measure of the spread of data values in a dataset based on the squared deviations from the mean, which is the average of the squared deviations of the observations from the mean
- -score
- a measure of the position of a data value in the dataset, calculated by subtracting the mean from the data value and then dividing the difference by the standard deviation