Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

Bayes’ Theorem
a method used to calculate a conditional probability when additional information is obtained to refine a probability estimate
binomial distribution
a probability distribution for a discrete random variable that is the number of successes in n independent trials. Each of these trials has two possible outcomes, success or failure, with the probability of success in each trial the same.
central tendency
the statistical measure that represents the center of a dataset
coefficient of variation (CV)
measures the variation of a dataset by calculating the standard deviation as a percentage of the mean
complement of an event
the set of all outcomes in the sample space that are not included in the event
conditional probability
the probability of an event given that another event has already occurred
continuous probability distribution
probability distribution that deals with random variables that can take on any value within a given range or interval
continuous random variable
a random variable where there is an infinite number of values that the variable can take on
dependent events
events where the probability of occurrence of one event is affected by the occurrence of another event
descriptive statistics
the organization, summarization, and graphical display of data
discrete probability distribution
probability distribution associated with variables that take on a finite or countably infinite number of distinct values
discrete random variable
a random variable where there is only a finite or countable infinite number of values that the variable can take on
empirical probability
a probability that is calculated based on data that has been collected from an experiment
empirical rule
a rule that provides the percentages of data values falling within one, two, and three standard deviations from the mean for a bell-shaped (normal) distribution
event
a subset of the sample space
frequency
a count of the number of times that an event or observation occurs in an experiment or study
frequency distribution
a method of organizing and summarizing a dataset that provides the frequency with which each value in the dataset occurs
independent events
events where the probability of occurrence of one event is not affected by the occurrence of another event
interquartile range (IQR)
a number that indicates the spread of the middle half, or middle 50%, of the data; the difference between the third quartile (Q3Q3) and the first quartile (Q1Q1)
mean (also called arithmetic mean)
a measure of center of a dataset, calculated by adding up the data values and dividing the sum by the number of data values; average
median
the middle value in an ordered dataset
mode
the most frequently occurring data value in a dataset
mutually exclusive events
events that cannot occur at the same time
normal distribution
a bell-shaped distribution curve that is used to model many measurements, including IQ scores, salaries, heights, weights, blood pressures, etc.
outcome
the result of a single trial in a probability experiment
outliers
data values that are significantly different from the other data values in a dataset
percentiles
numbers that divide an ordered dataset into hundredths; used to describe the relative standing of a particular value within a dataset by indicating the percentage of data points that fall below it
Poisson distribution
a probability distribution for discrete random variables used to calculate probabilities for a certain number of occurrences in a specific interval
population data
data representing all the outcomes or measurements that are of interest
population mean
the average for all measurements of interest corresponding to the entire group under study
population size
the number of measurements for the entire group under study
probability
a numerical measure that assesses the likelihood of occurrence of an event
probability analysis
provides the tools to model, understand, and quantify uncertainties, allowing data scientists to make informed decisions from data
probability density function (PDF)
a function that is used to describe the probability distribution of a continuous random variable
probability distribution
a mathematical function that assigns probabilities to various outcomes
probability mass function (PMF)
a function that is used to define the probability distribution of a discrete random variable
quartiles
numbers that divide an ordered dataset into quarters; the second quartile is the same as the median
random variable
a variable where a single numerical value is assigned to a specific outcome from an experiment
range
a measure of dispersion for a dataset calculated by subtracting the minimum from the maximum of the dataset
relative frequency probability
a method of determining the likelihood of an event occurring based on the observed frequency of its occurrence in a given sample or population
sample data
data representing outcomes or measurements collected from a subset or part of a population
sample mean
the average for a subset of the measurements of interest
sample size
the number of measurements for the subset taken from the overall population
sample space
the set of all possible outcomes in a probability experiment
standard deviation
a measure of the spread of a dataset, given in the same units as the data, that indicates how far a typical data value is from the mean
standard normal distribution
a normal distribution with mean of 0 and standard deviation of 1
statistical analysis
the science of collecting, organizing, and interpreting data to make decisions
theoretical probability
a probability that is calculated based on an assessment of equally likely outcomes
trimmed mean
a calculation for the average or mean of a dataset where some percentage of data values are removed from the lower and upper end of the dataset; typically used to mitigate the effects of outliers on the mean
variance
the measure of the spread of data values in a dataset based on the squared deviations from the mean, which is the average of the squared deviations of the observations from the mean
zz-score
a measure of the position of a data value in the dataset, calculated by subtracting the mean from the data value and then dividing the difference by the standard deviation
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.