Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

alternative hypothesis
a complementary statement regarding an unknown population parameter used in hypothesis testing
analysis of variance (ANOVA)
statistical method to compare three or more means and determine if the means are all statistically the same or if at least one mean is different from the others
best-fit linear equation
an equation of the form y^=a+bxy^=a+bx that provides the best-fit straight line to the (x,y)(x,y) data points
bivariate data
data collected on two variables where the data values are paired with one another
bootstrapping
a method to construct a confidence interval that is based on repeated sampling and does not rely on any assumptions regarding the underlying distribution
central limit theorem
describes the relationship between the sample distribution of sample means and the underlying population
confidence interval
an interval where sample data is used to provide an estimate for a population parameter
confidence level
the probability that the interval estimate will contain the population parameter, given that the estimation process on the parameter is repeated over and over
correlation
a measure of association between two numeric variables
correlation analysis
a statistical method used to evaluate and quantify the strength and direction of the linear relationship between two quantitative variables
correlation coefficient
a measure of the strength and direction of the linear relationship between two variables
critical value
z-score that cuts off an area under the normal curve corresponding to a specified confidence level
dependent samples
samples from one population that can be paired or matched to the samples taken from the second population
dependent variable
in correlation analysis, the variable being studied or measured; the dependent variable is the outcome that is measured or observed to determine the impact of changes in the independent variable
F distribution
a skewed probability distribution that arises in statistical hypothesis testing, such as ANOVA analysis
hypothesis testing
a statistical method to test claims regarding population parameters using sample data
independent samples
the sample from one population that is not related to the sample taken from the second population
independent variable
in correlation analysis, the variable that is manipulated or changed in an experiment or study; the value of the independent variable is controlled or chosen by the experimenter to observe its effect on the dependent variable
inferential statistics
statistical methods that allow researchers to infer or generalize observations from samples to the larger population from which they were selected
least squares method
a method used in linear regression that generates a straight line fit to the data values such that the sum of the squares of the residual is the least sum possible
level of significance (aa)
the maximum allowed probability of making a Type I error; the level of significance is the probability value used to determine when the sample data indicates significant evidence against the null hypothesis
linear correlation
a measure of the association between two variables that exhibit an approximate straight-line fit when plotted on a scatterplot
margin of error
an indication of the maximum error of the estimate
matched pairs
samples from one population that can be paired or matched to the samples taken from the second population
method of least squares
a mathematical method to generate a linear equation that is the “best fit” to the points on the scatterplot in the sense that the line minimizes the differences between the predicted values and observed values for y
modeling
the process of creating a mathematical representation that describes the relationship between different variables in a dataset; the model is then used to understand, explain, and predict the behavior of the data
nonparametric methods
statistical methods that do not rely on any assumptions regarding the underlying distribution
null hypothesis
statement of no effect or no change in the population
p-value
the probability of obtaining a sample statistic with a value as extreme as (or more extreme than) the value determined by the sample data under the assumption that the null hypothesis is true
parametric methods
statistical methods that assume a specific form for the underlying distribution
point estimate
a sample statistic used to estimate a population parameter
prediction
a forecast for the dependent variable based on a specific value of the independent variable generated using the linear model
proportion
a measure that expresses the relationship between a part and the whole; a proportion represents the fraction or percentage of a dataset that exhibits a particular characteristic or falls into a specific category
regression analysis
a statistical technique used to model the relationship between a dependent variable and one or more independent variables
residual
the difference between an observed y-value and the predicted y-value obtained from the linear regression equation
sample mean
a point estimate for the unknown population mean chosen as the most unbiased estimate of the population
sample proportion
chosen as the most unbiased estimate of the population, calculated as the number of successes divided by the sample size: p=xnp=xn
sample statistic
a numerical summary or measure that describes a characteristic of a sample, such as a sample mean or sample proportion
sampling distribution
a probability distribution of a sample statistic based on all possible random samples of a certain size from a population or the distribution of a statistic (such as the mean) that would result from taking random samples from the same population repeatedly and calculating the statistic for each sample
scatterplot (or scatter diagram)
graphical display that shows values of the independent variable plotted on the xx-axis and values of the dependent variable plotted on the yy-axis
standard error of the mean
the standard deviation of the sample mean, calculated as the population standard deviation divided by the square root of the sample size
standardized test statistic
a numerical measure that describes how many standard deviations a particular value is from the mean of a distribution; a standardized test statistic is typically used to assess whether an observed sample statistic is significantly different from what would be expected under a null hypothesis
t-distribution
a bell-shaped, symmetric distribution similar to the normal distribution, though the t-distribution has “thicker tails” as compared to the normal distribution
test statistic
a numerical value used to assess the strength of evidence against a null hypothesis, calculated from sample data that is used in hypothesis testing
Type I error
an error made in hypothesis testing where a researcher rejects the null hypothesis when in fact the null hypothesis is actually true
Type II error
an error made in hypothesis testing where a researcher fails to reject the null hypothesis when the null hypothesis is actually false
unbiased estimator
a statistic that provides a valid estimate for the corresponding population parameter without overestimating or underestimating the parameter
variable
a characteristic or attribute that can be measured or observed.
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.