Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Principles of Data Science

C | Appendix C: Review of Python Algorithms

Principles of Data ScienceC | Appendix C: Review of Python Algorithms

This appendix provides a summary of various Python algorithms used in this textbook. The intent is to provide students with a cross-reference to the various algorithms used in the text and provide a general description of the algorithm and link to the section of the text.

For more details on Python functions, syntax, and usage, please refer to Appendix D: Review of Python Functions and/or the Python documentation online.

Chapter Title Topic Description of Algorithm First Reference
What Are Data and Data Science? Loading and viewing data Using Python
pandas
library to load a CSV file, describe its features, and explore its contents
Python Basics for Data Science
Visualizing data using Python Using Python to create a scatterplot for two numeric quantities Python Basics for Data Science
Collecting and Preparing Data Scraping data from a website Using Python to extract a data table from a website Web Scraping and Social Media Data Collection
Using regular expressions in Python Using Python to search for a selected word in a given string and output the number of times it appears Web Scraping and Social Media Data Collection
Processing and storing data Using Python to process data and output to a CSV file Web Scraping and Social Media Data Collection
Parsing and extracting data Using Python to parse and extract specific data from a given dataset Web Scraping and Social Media Data Collection
Descriptive Statistics: Statistical Measurements and Probability Distributions Calculate binomial probabilities Using Python to calculate probabilities associated with the binomial distribution Discrete and Continuous Probability Distributions
Calculate probabilities from a normal distribution Using Python to calculate probabilities associated with the normal distribution Discrete and Continuous Probability Distributions
Inferential Statistics and Regression Analysis Computing margin of error Using Python library
scipy.stats
to compute a margin of error using the t-distribution
Statistical Inference and Confidence Intervals
Confidence interval using bootstrapping Using Python to calculate a confidence interval using a bootstrapping approach Statistical Inference and Confidence Intervals
Confidence interval for the mean Using Python to calculate a confidence interval for the mean using the normal and t-distributions (two examples) Statistical Inference and Confidence Intervals
Hypothesis test for the mean (one sample) Using Python to calculate a p-value associated with a hypothesis test for the mean Hypothesis Testing
Hypothesis test for a proportion Using Python to calculate a p-value associated with a hypothesis test for a proportion Hypothesis Testing
Hypothesis test for the mean (one sample) Using Python to calculate a test statistic and p-value associated with a hypothesis test for the mean Hypothesis Testing
Hypothesis test for the mean (two samples) Using Python to calculate a test statistic and p-value associated with a hypothesis test for the difference between two means Hypothesis Testing
Correlation coefficient Using Python to calculate the Pearson correlation coefficient for two numeric variables Correlation and Linear Regression Analysis
Creating a scatterplot Using Python to create a scatterplot for two numeric quantities Correlation and Linear Regression Analysis
Creating a linear regression model Using Python to calculate the slope and intercept for a linear regression model Correlation and Linear Regression Analysis
One-way analysis of variance (ANOVA) Using Python to conduct a one-way analysis of variance hypothesis test Analysis of Variance (ANOVA)
Visualizing time series data Using basic Python plot routine and
matplotlib.pyplot
to generate a time series graph based on a
pandas
DataFrame (two examples)
Analysis of Variance (ANOVA)
Time Series and Forecasting Plotting a simple moving average (SMA) and differencing Using Python to generate a simple moving average and first-order difference for time series data Time Series Forecasting Methods
Decomposing a time series into components Using Python autocorrelation function and STL (seasonal and trend decomposition using LOESS) model to decompose time series data into its components Time Series Forecasting Methods
Exponential moving average (EMA) Using Python to calculate exponential moving average for time series data Time Series Forecasting Methods
Autoregressive integrated moving average (ARIMA) Using Python to calculate autoregressive integrated moving average model for time series data Time Series Forecasting Methods
Autoregressive integrated moving average (ARIMA) Using Python to fit and plot an autoregressive integrated moving average and use it to make forecasts for time series data Time Series Forecasting Methods
Forecasting methods Using Python to create and plot a forecasting model with confidence intervals for time series data Forecast Evaluation Methods
Decision-Making Using Machine Learning Basics Logistic regression Using Python to fit a logistic regression model and assess the accuracy of the model Classification Using Machine Learning
K-means clustering Using Python to produce a k-means clustering model from a dataset Classification Using Machine Learning
DBscan clustering Using Python to generate a density-based spatial clustering of applications with noise (DBScan) model Classification Using Machine Learning
Confusion matrix Using Python to generate a confusion matrix Classification Using Machine Learning
Linear regression with bootstrapping Using Python to generate a linear regression model using a bootstrapping method Machine Learning in Regression Analysis
Multiple regression Using Python to perform multiple regression analysis Machine Learning in Regression Analysis
Three-dimensional scatterplot Using Python to generate a three-dimensional plot of a multiple regression model Machine Learning in Regression Analysis
Mesh grid Using Python to generate a mesh grid plot of a multiple regression model Machine Learning in Regression Analysis
Multiple logistic regression Using Python to perform multiple logistic regression analysis and generate corresponding confusion matrix Machine Learning in Regression Analysis
Decision trees Using Python to generate decision trees Decision Trees
Random forests Using Python to train a random forests model and analyze the importance of each feature Other Machine Learning Techniques
Gaussian naïve Bayes Using Python to perform Gaussian naïve Bayes analysis Other Machine Learning Techniques
Deep Learning and Artificial Intelligence (AI) Basics Perceptrons Using Python to train and test a perceptron classification model and assess the accuracy of the model Introduction to Neural Networks
Training a neural network with backpropagation Using Python’s
TensorFlow
library to train and test a neural network classification model using backpropagation and assess the accuracy of the model
Backpropagation
Recurrent neural networks Using Python’s
TensorFlow
library to train and test a classification model using recurrent neural networks (RNN) and assess the accuracy of the model
Backpropagation
Predict future values Using Python to predict future values for a classification model using recurrent neural networks (RNN) Backpropagation
Plot predicted values Using Python to plot future values versus original data for a classification model using recurrent neural networks Backpropagation
Deep learning Using Python’s
TensorFlow
library to train and test a classification model using deep learning and assess the accuracy of the model
Backpropagation
Visualizing Data Data visualization using boxplots Using Python to create boxplots Encoding Univariate Data
Data visualization using histograms Using Python to create histograms Encoding Univariate Data
Data visualization using Pareto charts Using Python to create Pareto charts Encoding Univariate Data
Data visualization using time series charts Using Python to create time series charts Encoding Data That Change over Time
Data visualization for binomial probabilities Using Python to create graphs associated with the binomial distribution Graphing Probability Distributions
Data visualization for Poisson probabilities Using Python to create graphs associated with the Poisson distribution Graphing Probability Distributions
Data visualization for normal probabilities Using Python to create graphs associated with the normal distribution Graphing Probability Distributions
Heatmaps Using Python to create heatmaps Geospatial and Heatmap Data Visualization Using Python
Scatterplots with colormaps Using Python to create scatterplots with colormaps Multivariate and Network Data Visualization Using Python
Correlation heatmaps Using Python to create correlation heatmaps Multivariate and Network Data Visualization Using Python
Data visualization using three-dimensional plots Using Python to create three-dimensional plots Multivariate and Network Data Visualization Using Python
Reporting Results Identify data characteristics Using Python to identify data characteristics of a dataset Validating Your Model
Decision tree Using Python to run a decision tree and generate a visualization of the result Validating Your Model
Model validation using Bayesian information criterion (BIC) Using Python to perform cross validation with Bayesian information criterion Validating Your Model
Monte Carlo simulation Using Python to perform Monte Carlo simulation Validating Your Model
Executive summary Using Python to create an executive summary report Effective Executive Summaries
Table C1
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.