Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

A
ACF (autocorrelation function) 5.3 Time Series Forecasting Methods
adaptive learning 1.2 Data Science in Practice
Advanced NLP models 7.5 Natural Language Processing
alternative hypothesis 4.2 Hypothesis Testing
Analysis of variance (ANOVA) 4.4 Analysis of Variance (ANOVA)
application programming interface (API) 2.3 Web Scraping and Social Media Data Collection
arithmetic mean 3.1 Measures of Center
Artificial intelligence (AI) Introduction
Association of Data Scientists 8.1 Ethics in Data Collection
Association of Data Scientists (ADaSci) 1.4 Using Technology for Data Science
at least one occurrence 3.4 Probability Theory
Augmented Dickey-Fuller (ADF) test 5.3 Time Series Forecasting Methods
Autoregressive (AR) model 5.3 Time Series Forecasting Methods
autoregressive integrated moving average (ARIMA) model 5.1 Introduction to Time Series Analysis
B
Backpropagation 7.2 Backpropagation
Bayes’ Theorem 3.4 Probability Theory
BCNF (Boyce-Codd Normal Form) 2.4 Data Cleaning and Preprocessing
binary (binomial) classification 6.2 Classification Using Machine Learning
Binary cross entropy loss 7.2 Backpropagation
bootstrap aggregating (bagging) 6.5 Other Machine Learning Techniques
bootstrap samples 10.2 Validating Your Model
box-and-whisker plot 9.1 Encoding Univariate Data
Bureau of Labor Statistics (BLS) 1.3 Data and Datasets, 1 Group Project
C
Careers in Data Science 1.2 Data Science in Practice
Categorical data 1.3 Data and Datasets
central tendency 3.1 Measures of Center
Closed-ended questions 2.2 Survey Design and Implementation
Cloud computing 2.5 Handling Large Datasets
coefficient of variation (CV) 3.2 Measures of Variation
comma-separated values (CSV) 1.3 Data and Datasets
complement of an event 3.4 Probability Theory
conditional probability 3.4 Probability Theory
connecting weight 7.2 Backpropagation
consumer analysis Introduction
continuous data 1.3 Data and Datasets
Continuous probability distributions 3.5 Discrete and Continuous Probability Distributions
Convolutional layers 7.4 Convolutional Neural Networks
Convolutional neural networks (CNNs) 7.4 Convolutional Neural Networks
Correctional Offender Management Profiling for Alternative Sanctions 8.2 Ethics in Data Analysis and Modeling
correlation Introduction
customer surveys 3.1 Measures of Center
D
DALL-E and Craiyon 7.5 Natural Language Processing
Data collection and preparation Introduction
Data compression 2.5 Handling Large Datasets
data governance protocols 8.1 Ethics in Data Collection
data management 2.5 Handling Large Datasets
data preparation 1.1 What Is Data Science?
Data privacy Introduction
Data reporting 1.1 What Is Data Science?
Data Science Association 8.1 Ethics in Data Collection
Data Science Association (DSA) 1.4 Using Technology for Data Science
data science cycle 1.1 What Is Data Science?
data volumes Introduction
data warehousing 1.1 What Is Data Science?
Database management 2.5 Handling Large Datasets
DataFrame.describe() 3.2 Measures of Variation
decision tree 6.4 Decision Trees
decision-making Introduction
density-based clustering algorithm 6.2 Classification Using Machine Learning
Dependent events 3.4 Probability Theory
dependent samples 4.2 Hypothesis Testing
Depth-limiting pruning 6.4 Decision Trees
descriptive statistics Introduction
discrete data 1.3 Data and Datasets
Discrete probability distributions 3.5 Discrete and Continuous Probability Distributions
dynamic backpropagation 7.2 Backpropagation
E
empirical probability 3.4 Probability Theory
energy consumption 1.2 Data Science in Practice
engineering analysis Introduction
error-based (reduced-error) pruning 6.4 Decision Trees
ethical data collection 8.1 Ethics in Data Collection
Ethics in data science Introduction
executive (or data) dashboard 10.1 Writing an Informative Report
exploding gradient problem 7.2 Backpropagation
exponential moving average (EMA) 5.3 Time Series Forecasting Methods
Exponential transformations 2.4 Data Cleaning and Preprocessing
Extensible Markup Language (XML) 1.3 Data and Datasets
I
Independent events 3.4 Probability Theory
independent samples 4.2 Hypothesis Testing
Inferential statistics Introduction
information 1.3 Data and Datasets
information gain 6.4 Decision Trees
Information theory 6.4 Decision Trees
Initiative for Analytics and Data Science Standards 8.1 Ethics in Data Collection
Initiative for Analytics and Data Science Standards (IADSS) 1.4 Using Technology for Data Science
Integrative (I) component 5.3 Time Series Forecasting Methods
Intellectual property 8.1 Ethics in Data Collection
Internet of Things (IoT) 1.2 Data Science in Practice
M
machine learning (ML) model 6.1 What Is Machine Learning?
machine learning life cycle 6.1 What Is Machine Learning?
MAR (missing at random) data 2.4 Data Cleaning and Preprocessing
matched pairs 4.2 Hypothesis Testing
MCAR (missing completely at random) data 2.4 Data Cleaning and Preprocessing
Mean absolute percentage error (MAPE) 5.4 Forecast Evaluation Methods, 6.1 What Is Machine Learning?
mean percentage error (MPE) 10.2 Validating Your Model
Mean squared error (MSE) 6.1 What Is Machine Learning?
memory cells 7.2 Backpropagation
Minimum description length (MDL) pruning 6.4 Decision Trees
MNAR (missing not at random) data 2.4 Data Cleaning and Preprocessing
Monte Carlo simulation 10.2 Validating Your Model
Multi-way sensitivity analysis 10.2 Validating Your Model
multiclass (multinomial) classification 6.2 Classification Using Machine Learning
multilayer perceptron (MLP) 7.1 Introduction to Neural Networks
multiplicative decomposition 5.2 Components of Time Series Analysis
N
naïve Bayes classification 6.5 Other Machine Learning Techniques
National Oceanic and Atmospheric Administration (NOAA) 1.3 Data and Datasets, 1 Group Project
natural language processing 7.5 Natural Language Processing
Natural Language Toolkit (NLTK) 2.3 Web Scraping and Social Media Data Collection
nominal data 1.3 Data and Datasets
NoSQL databases 2.5 Handling Large Datasets
null hypothesis 4.2 Hypothesis Testing
Numeric data 1.3 Data and Datasets
P
P ( A | B ) P ( A | B ) 3.4 Probability Theory
paired samples 4.2 Hypothesis Testing
personally identifiable information (PII) 8.1 Ethics in Data Collection, 8.1 Ethics in Data Collection
Pew Research Center 1.3 Data and Datasets, 1 Group Project
Population data 3.1 Measures of Center
population size 3.1 Measures of Center
population variance 3.2 Measures of Variation
potential for bias 7.5 Natural Language Processing
Precision Medicine Initiative 1.2 Data Science in Practice
prediction interval 5.4 Forecast Evaluation Methods
predictive analytics 1.2 Data Science in Practice
principal component analysis (PCA) 6.1 What Is Machine Learning?
Probability analysis Introduction
probability experiment 3.4 Probability Theory
R
recommendation systems 1.2 Data Science in Practice
Rectified linear unit (ReLU) function 7.1 Introduction to Neural Networks
recurrent neural networks (RNNs) 7.2 Backpropagation
RedBlue.csv 6.4 Decision Trees
Regression analysis Introduction
regression models 10.2 Validating Your Model
regression tree 6.4 Decision Trees
regulatory compliance officer (RCO) 8.1 Ethics in Data Collection
Relational databases 2.5 Handling Large Datasets
Relative frequency probability 3.4 Probability Theory
Russian tank problem 6.1 What Is Machine Learning?
S
sample space 3.4 Probability Theory
sample variance 3.2 Measures of Variation
sampling (or sample) distribution 4.1 Statistical Inference and Confidence Intervals
Sarbanes-Oxley Act [SOX] 8.1 Ethics in Data Collection
scatterplot, or scatter diagram 4.3 Correlation and Linear Regression Analysis
Scenario analysis 10.2 Validating Your Model
Seasonal-Trend decomposition using LOESS (STL) 5.3 Time Series Forecasting Methods
semantic segmentation 7.4 Convolutional Neural Networks
Sensitivity analysis 10.2 Validating Your Model
simple moving average (SMA) 5.3 Time Series Forecasting Methods
Simple random selection 2.2 Survey Design and Implementation
Sparse categorical cross entropy 7.2 Backpropagation
speech recognition 7.5 Natural Language Processing
sports analytics 1.2 Data Science in Practice
square root transformation 2.4 Data Cleaning and Preprocessing
standard deviation 3.2 Measures of Variation
standardized test statistic 4.2 Hypothesis Testing
static backpropagation 7.2 Backpropagation
statistical validation 10.2 Validating Your Model
structured dataset 1.3 Data and Datasets
sum of squares 3.2 Measures of Variation
Supervised learning 6.1 What Is Machine Learning?
supervised learning cycle 6.1 What Is Machine Learning?
Symmetric mean absolute percentage error (sMAPE) 5.4 Forecast Evaluation Methods
T
test statistic 4.2 Hypothesis Testing
testing set (or data) 6.1 What Is Machine Learning?
text-to-speech (TTS) 7.5 Natural Language Processing
Theoretical probability 3.4 Probability Theory
topological data analysis (TDA) 6.1 What Is Machine Learning?
training set (or data) 6.1 What Is Machine Learning?
Transformer models 7.5 Natural Language Processing
ttest_1samp() 4.2 Hypothesis Testing
Type I error 4.2 Hypothesis Testing
Type II error 4.2 Hypothesis Testing
U
Universal design principles 8.3 Ethics in Visualization and Reporting
unstructured data Introduction
unstructured dataset 1.3 Data and Datasets
Unsupervised learning 6.1 What Is Machine Learning?
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.