This appendix provides a summary of various Python algorithms used in this textbook. The intent is to provide students with a cross-reference to the various algorithms used in the text and provide a general description of the algorithm and link to the section of the text.
For more details on Python functions, syntax, and usage, please refer to Appendix D: Review of Python Functions and/or the Python documentation online.
| Chapter Title | Topic | Description of Algorithm | First Reference |
|---|---|---|---|
| What Are Data and Data Science? | Loading and viewing data | Using Python pandas library to load a CSV file, describe its features, and explore its contents |
Python Basics for Data Science |
| Visualizing data using Python | Using Python to create a scatterplot for two numeric quantities | Python Basics for Data Science | |
| Collecting and Preparing Data | Scraping data from a website | Using Python to extract a data table from a website | Web Scraping and Social Media Data Collection |
| Using regular expressions in Python | Using Python to search for a selected word in a given string and output the number of times it appears | Web Scraping and Social Media Data Collection | |
| Processing and storing data | Using Python to process data and output to a CSV file | Web Scraping and Social Media Data Collection | |
| Parsing and extracting data | Using Python to parse and extract specific data from a given dataset | Web Scraping and Social Media Data Collection | |
| Descriptive Statistics: Statistical Measurements and Probability Distributions | Calculate binomial probabilities | Using Python to calculate probabilities associated with the binomial distribution | Discrete and Continuous Probability Distributions |
| Calculate probabilities from a normal distribution | Using Python to calculate probabilities associated with the normal distribution | Discrete and Continuous Probability Distributions | |
| Inferential Statistics and Regression Analysis | Computing margin of error | Using Python library scipy.stats to compute a margin of error using the t-distribution |
Statistical Inference and Confidence Intervals |
| Confidence interval using bootstrapping | Using Python to calculate a confidence interval using a bootstrapping approach | Statistical Inference and Confidence Intervals | |
| Confidence interval for the mean | Using Python to calculate a confidence interval for the mean using the normal and t-distributions (two examples) | Statistical Inference and Confidence Intervals | |
| Hypothesis test for the mean (one sample) | Using Python to calculate a p-value associated with a hypothesis test for the mean | Hypothesis Testing | |
| Hypothesis test for a proportion | Using Python to calculate a p-value associated with a hypothesis test for a proportion | Hypothesis Testing | |
| Hypothesis test for the mean (one sample) | Using Python to calculate a test statistic and p-value associated with a hypothesis test for the mean | Hypothesis Testing | |
| Hypothesis test for the mean (two samples) | Using Python to calculate a test statistic and p-value associated with a hypothesis test for the difference between two means | Hypothesis Testing | |
| Correlation coefficient | Using Python to calculate the Pearson correlation coefficient for two numeric variables | Correlation and Linear Regression Analysis | |
| Creating a scatterplot | Using Python to create a scatterplot for two numeric quantities | Correlation and Linear Regression Analysis | |
| Creating a linear regression model | Using Python to calculate the slope and intercept for a linear regression model | Correlation and Linear Regression Analysis | |
| One-way analysis of variance (ANOVA) | Using Python to conduct a one-way analysis of variance hypothesis test | Analysis of Variance (ANOVA) | |
| Visualizing time series data | Using basic Python plot routine and matplotlib.pyplot to generate a time series graph based on a pandas DataFrame (two examples) |
Analysis of Variance (ANOVA) | |
| Time Series and Forecasting | Plotting a simple moving average (SMA) and differencing | Using Python to generate a simple moving average and first-order difference for time series data | Time Series Forecasting Methods |
| Decomposing a time series into components | Using Python autocorrelation function and STL (seasonal and trend decomposition using LOESS) model to decompose time series data into its components | Time Series Forecasting Methods | |
| Exponential moving average (EMA) | Using Python to calculate exponential moving average for time series data | Time Series Forecasting Methods | |
| Autoregressive integrated moving average (ARIMA) | Using Python to calculate autoregressive integrated moving average model for time series data | Time Series Forecasting Methods | |
| Autoregressive integrated moving average (ARIMA) | Using Python to fit and plot an autoregressive integrated moving average and use it to make forecasts for time series data | Time Series Forecasting Methods | |
| Forecasting methods | Using Python to create and plot a forecasting model with confidence intervals for time series data | Forecast Evaluation Methods | |
| Decision-Making Using Machine Learning Basics | Logistic regression | Using Python to fit a logistic regression model and assess the accuracy of the model | Classification Using Machine Learning |
| K-means clustering | Using Python to produce a k-means clustering model from a dataset | Classification Using Machine Learning | |
| DBscan clustering | Using Python to generate a density-based spatial clustering of applications with noise (DBScan) model | Classification Using Machine Learning | |
| Confusion matrix | Using Python to generate a confusion matrix | Classification Using Machine Learning | |
| Linear regression with bootstrapping | Using Python to generate a linear regression model using a bootstrapping method | Machine Learning in Regression Analysis | |
| Multiple regression | Using Python to perform multiple regression analysis | Machine Learning in Regression Analysis | |
| Three-dimensional scatterplot | Using Python to generate a three-dimensional plot of a multiple regression model | Machine Learning in Regression Analysis | |
| Mesh grid | Using Python to generate a mesh grid plot of a multiple regression model | Machine Learning in Regression Analysis | |
| Multiple logistic regression | Using Python to perform multiple logistic regression analysis and generate corresponding confusion matrix | Machine Learning in Regression Analysis | |
| Decision trees | Using Python to generate decision trees | Decision Trees | |
| Random forests | Using Python to train a random forests model and analyze the importance of each feature | Other Machine Learning Techniques | |
| Gaussian naïve Bayes | Using Python to perform Gaussian naïve Bayes analysis | Other Machine Learning Techniques | |
| Deep Learning and Artificial Intelligence (AI) Basics | Perceptrons | Using Python to train and test a perceptron classification model and assess the accuracy of the model | Introduction to Neural Networks |
| Training a neural network with backpropagation | Using Python’s TensorFlow library to train and test a neural network classification model using backpropagation and assess the accuracy of the model |
Backpropagation | |
| Recurrent neural networks | Using Python’s TensorFlow library to train and test a classification model using recurrent neural networks (RNN) and assess the accuracy of the model |
Backpropagation | |
| Predict future values | Using Python to predict future values for a classification model using recurrent neural networks (RNN) | Backpropagation | |
| Plot predicted values | Using Python to plot future values versus original data for a classification model using recurrent neural networks | Backpropagation | |
| Deep learning | Using Python’s TensorFlow library to train and test a classification model using deep learning and assess the accuracy of the model |
Backpropagation | |
| Visualizing Data | Data visualization using boxplots | Using Python to create boxplots | Encoding Univariate Data |
| Data visualization using histograms | Using Python to create histograms | Encoding Univariate Data | |
| Data visualization using Pareto charts | Using Python to create Pareto charts | Encoding Univariate Data | |
| Data visualization using time series charts | Using Python to create time series charts | Encoding Data That Change over Time | |
| Data visualization for binomial probabilities | Using Python to create graphs associated with the binomial distribution | Graphing Probability Distributions | |
| Data visualization for Poisson probabilities | Using Python to create graphs associated with the Poisson distribution | Graphing Probability Distributions | |
| Data visualization for normal probabilities | Using Python to create graphs associated with the normal distribution | Graphing Probability Distributions | |
| Heatmaps | Using Python to create heatmaps | Geospatial and Heatmap Data Visualization Using Python | |
| Scatterplots with colormaps | Using Python to create scatterplots with colormaps | Multivariate and Network Data Visualization Using Python | |
| Correlation heatmaps | Using Python to create correlation heatmaps | Multivariate and Network Data Visualization Using Python | |
| Data visualization using three-dimensional plots | Using Python to create three-dimensional plots | Multivariate and Network Data Visualization Using Python | |
| Reporting Results | Identify data characteristics | Using Python to identify data characteristics of a dataset | Validating Your Model |
| Decision tree | Using Python to run a decision tree and generate a visualization of the result | Validating Your Model | |
| Model validation using Bayesian information criterion (BIC) | Using Python to perform cross validation with Bayesian information criterion | Validating Your Model | |
| Monte Carlo simulation | Using Python to perform Monte Carlo simulation | Validating Your Model | |
| Executive summary | Using Python to create an executive summary report | Effective Executive Summaries |