This appendix provides a summary of various Python algorithms used in this textbook. The intent is to provide students with a cross-reference to the various algorithms used in the text and provide a general description of the algorithm and link to the section of the text.
For more details on Python functions, syntax, and usage, please refer to Appendix D: Review of Python Functions and/or the Python documentation online.
Chapter Title | Topic | Description of Algorithm | First Reference |
---|---|---|---|
What Are Data and Data Science? | Loading and viewing data | Using Python pandas library to load a CSV file, describe its features, and explore its contents |
Python Basics for Data Science |
Visualizing data using Python | Using Python to create a scatterplot for two numeric quantities | Python Basics for Data Science | |
Collecting and Preparing Data | Scraping data from a website | Using Python to extract a data table from a website | Web Scraping and Social Media Data Collection |
Using regular expressions in Python | Using Python to search for a selected word in a given string and output the number of times it appears | Web Scraping and Social Media Data Collection | |
Processing and storing data | Using Python to process data and output to a CSV file | Web Scraping and Social Media Data Collection | |
Parsing and extracting data | Using Python to parse and extract specific data from a given dataset | Web Scraping and Social Media Data Collection | |
Descriptive Statistics: Statistical Measurements and Probability Distributions | Calculate binomial probabilities | Using Python to calculate probabilities associated with the binomial distribution | Discrete and Continuous Probability Distributions |
Calculate probabilities from a normal distribution | Using Python to calculate probabilities associated with the normal distribution | Discrete and Continuous Probability Distributions | |
Inferential Statistics and Regression Analysis | Computing margin of error | Using Python library scipy.stats to compute a margin of error using the t-distribution |
Statistical Inference and Confidence Intervals |
Confidence interval using bootstrapping | Using Python to calculate a confidence interval using a bootstrapping approach | Statistical Inference and Confidence Intervals | |
Confidence interval for the mean | Using Python to calculate a confidence interval for the mean using the normal and t-distributions (two examples) | Statistical Inference and Confidence Intervals | |
Hypothesis test for the mean (one sample) | Using Python to calculate a p-value associated with a hypothesis test for the mean | Hypothesis Testing | |
Hypothesis test for a proportion | Using Python to calculate a p-value associated with a hypothesis test for a proportion | Hypothesis Testing | |
Hypothesis test for the mean (one sample) | Using Python to calculate a test statistic and p-value associated with a hypothesis test for the mean | Hypothesis Testing | |
Hypothesis test for the mean (two samples) | Using Python to calculate a test statistic and p-value associated with a hypothesis test for the difference between two means | Hypothesis Testing | |
Correlation coefficient | Using Python to calculate the Pearson correlation coefficient for two numeric variables | Correlation and Linear Regression Analysis | |
Creating a scatterplot | Using Python to create a scatterplot for two numeric quantities | Correlation and Linear Regression Analysis | |
Creating a linear regression model | Using Python to calculate the slope and intercept for a linear regression model | Correlation and Linear Regression Analysis | |
One-way analysis of variance (ANOVA) | Using Python to conduct a one-way analysis of variance hypothesis test | Analysis of Variance (ANOVA) | |
Visualizing time series data | Using basic Python plot routine and matplotlib.pyplot to generate a time series graph based on a pandas DataFrame (two examples) |
Analysis of Variance (ANOVA) | |
Time Series and Forecasting | Plotting a simple moving average (SMA) and differencing | Using Python to generate a simple moving average and first-order difference for time series data | Time Series Forecasting Methods |
Decomposing a time series into components | Using Python autocorrelation function and STL (seasonal and trend decomposition using LOESS) model to decompose time series data into its components | Time Series Forecasting Methods | |
Exponential moving average (EMA) | Using Python to calculate exponential moving average for time series data | Time Series Forecasting Methods | |
Autoregressive integrated moving average (ARIMA) | Using Python to calculate autoregressive integrated moving average model for time series data | Time Series Forecasting Methods | |
Autoregressive integrated moving average (ARIMA) | Using Python to fit and plot an autoregressive integrated moving average and use it to make forecasts for time series data | Time Series Forecasting Methods | |
Forecasting methods | Using Python to create and plot a forecasting model with confidence intervals for time series data | Forecast Evaluation Methods | |
Decision-Making Using Machine Learning Basics | Logistic regression | Using Python to fit a logistic regression model and assess the accuracy of the model | Classification Using Machine Learning |
K-means clustering | Using Python to produce a k-means clustering model from a dataset | Classification Using Machine Learning | |
DBscan clustering | Using Python to generate a density-based spatial clustering of applications with noise (DBScan) model | Classification Using Machine Learning | |
Confusion matrix | Using Python to generate a confusion matrix | Classification Using Machine Learning | |
Linear regression with bootstrapping | Using Python to generate a linear regression model using a bootstrapping method | Machine Learning in Regression Analysis | |
Multiple regression | Using Python to perform multiple regression analysis | Machine Learning in Regression Analysis | |
Three-dimensional scatterplot | Using Python to generate a three-dimensional plot of a multiple regression model | Machine Learning in Regression Analysis | |
Mesh grid | Using Python to generate a mesh grid plot of a multiple regression model | Machine Learning in Regression Analysis | |
Multiple logistic regression | Using Python to perform multiple logistic regression analysis and generate corresponding confusion matrix | Machine Learning in Regression Analysis | |
Decision trees | Using Python to generate decision trees | Decision Trees | |
Random forests | Using Python to train a random forests model and analyze the importance of each feature | Other Machine Learning Techniques | |
Gaussian naïve Bayes | Using Python to perform Gaussian naïve Bayes analysis | Other Machine Learning Techniques | |
Deep Learning and Artificial Intelligence (AI) Basics | Perceptrons | Using Python to train and test a perceptron classification model and assess the accuracy of the model | Introduction to Neural Networks |
Training a neural network with backpropagation | Using Python’s TensorFlow library to train and test a neural network classification model using backpropagation and assess the accuracy of the model |
Backpropagation | |
Recurrent neural networks | Using Python’s TensorFlow library to train and test a classification model using recurrent neural networks (RNN) and assess the accuracy of the model |
Backpropagation | |
Predict future values | Using Python to predict future values for a classification model using recurrent neural networks (RNN) | Backpropagation | |
Plot predicted values | Using Python to plot future values versus original data for a classification model using recurrent neural networks | Backpropagation | |
Deep learning | Using Python’s TensorFlow library to train and test a classification model using deep learning and assess the accuracy of the model |
Backpropagation | |
Visualizing Data | Data visualization using boxplots | Using Python to create boxplots | Encoding Univariate Data |
Data visualization using histograms | Using Python to create histograms | Encoding Univariate Data | |
Data visualization using Pareto charts | Using Python to create Pareto charts | Encoding Univariate Data | |
Data visualization using time series charts | Using Python to create time series charts | Encoding Data That Change over Time | |
Data visualization for binomial probabilities | Using Python to create graphs associated with the binomial distribution | Graphing Probability Distributions | |
Data visualization for Poisson probabilities | Using Python to create graphs associated with the Poisson distribution | Graphing Probability Distributions | |
Data visualization for normal probabilities | Using Python to create graphs associated with the normal distribution | Graphing Probability Distributions | |
Heatmaps | Using Python to create heatmaps | Geospatial and Heatmap Data Visualization Using Python | |
Scatterplots with colormaps | Using Python to create scatterplots with colormaps | Multivariate and Network Data Visualization Using Python | |
Correlation heatmaps | Using Python to create correlation heatmaps | Multivariate and Network Data Visualization Using Python | |
Data visualization using three-dimensional plots | Using Python to create three-dimensional plots | Multivariate and Network Data Visualization Using Python | |
Reporting Results | Identify data characteristics | Using Python to identify data characteristics of a dataset | Validating Your Model |
Decision tree | Using Python to run a decision tree and generate a visualization of the result | Validating Your Model | |
Model validation using Bayesian information criterion (BIC) | Using Python to perform cross validation with Bayesian information criterion | Validating Your Model | |
Monte Carlo simulation | Using Python to perform Monte Carlo simulation | Validating Your Model | |
Executive summary | Using Python to create an executive summary report | Effective Executive Summaries |