Dr. Shaun V. Ault; Dr. Soohyun Nam Liao; Larry Musolino

This appendix provides a summary of Python functions used in this textbook. The intent is to provide students with a cross-reference of Python commands that includes a description of the Python functions, general syntax for usage, and a link to the section where the function is first used in the text.

Please note this is a very high-level description of these functions. Many functions require specific libraries to be installed. For more details on Python functions, syntax, and usage, please refer to the Python documentation posted online.

Python Function	Description	Syntax	First Reference
What Are Data and Data Science?
print()	Prints a specified message or specified values to the screen or other output device	print(“text”) print(x, y)	Python Basics for Data Science
pd.read_csv()	Loads data from a CSV (comma-separated values) file and stores in a DataFrame	pd.read_csv (path_to_csv datafile)	Python Basics for Data Science
DataFrame.describe()	Returns a table with basic statistics for a dataset including min, max, mean, count, and quartiles	DataFrame.describe() Where: DataFrame is the name of the DataFrame.	Python Basics for Data Science
DataFrame.iloc[]	Allows access to data in a DataFrame using row/column integer-based indexes.	DataFrame.iloc[row, column] Where: DataFrame is the name of the DataFrame.	Python Basics for Data Science
DataFrame.loc[]	Used to access a group of rows and columns by labels or a Boolean array	DataFrame.loc[criteria] Where: DataFrame is the name of the DataFrame.	Python Basics for Data Science
Plt.scatter()	Generates a scatterplot for (x, y) data	plt.scatter(x_data, y_data)	Python Basics for Data Science
Plt.title()	Specifies a title for a chart	plt.title(“Title”)	Python Basics for Data Science
Plt.xlabel()	Specifies a label for the x-axis	plt.xlabel(“x-axis label”)	Python Basics for Data Science
Plt.ylabel()	Specifies a label for the y-axis	plt.ylabel(“y-axis label”)	Python Basics for Data Science
Plt.xlim()	Specifies limits to use for x-axis numbering	plt.xlim(lower, upper)	Python Basics for Data Science
Plt.ylim()	Specifies limits to use for y-axis numbering	plt.ylim(lower, upper)	Python Basics for Data Science
Collecting and Preparing Data
pd.read_html()	Read HTML table from a web page and convert into a DataFrame	pd.read_html(URL)	Web Scraping and Social Media Data Collection
pd.to_numeric()	Converts strings or other data types to numeric values	pd.to_numeric (column_name)	Web Scraping and Social Media Data Collection
len()	Returns the length of an object	len(object)	Web Scraping and Social Media Data Collection
re.findall()	Returns all non-overlapping matches of a specified pattern in a string	re.findall(pattern, string)	Web Scraping and Social Media Data Collection
re.search()	Checks if a specified pattern appears in a string	re.search(pattern, string)	Web Scraping and Social Media Data Collection
Descriptive Statistics: Statistical Measurements and Probability Distributions
binom.pmf()	Calculates the probability mass function (PMF) for a binomial distribution. It gives the probability of having exactly x successes in n trials with success probability p.	binom.pmf(x, n, p) Where: x is the number of successes in the experiment, n is the number of trials in the experiment, p is the probability of success.	Discrete and Continuous Probability Distributions
round()	Rounds a numeric result to a specified level of precision	round(number, digits)	Discrete and Continuous Probability Distributions
poisson.pmf()	Calculates probabilities associated with the Poisson distribution	poisson.pmf(x, mu) Where: x is the number of events of interest, mu is the mean of the Poisson distribution.	Discrete and Continuous Probability Distributions
norm.cdf()	Calculates probabilities associated with the normal distribution (returns the area under the normal probability density function to the left of a specified measurement)	norm.cdf(x, mu, std) Where: x is the measurement of interest, mu is the mean of the normal distribution, std is the standard deviation of the normal distribution.	Discrete and Continuous Probability Distributions
Inferential Statistics and Regression Analysis
t.ppf()	Generates the value of the t-distribution corresponding to a specified area under the t-distribution curve and specified degrees of freedom	t.ppf (area to left, degrees of freedom)	Statistical Inference and Confidence Intervals
bootstrap()	Performs bootstrap process to generate confidence interval	bootstrap (data, statistic, confidence_level, number_resamples)	Statistical Inference and Confidence Intervals
norm.interval()	Calculates confidence interval for the mean when population standard deviation is known, given sample mean, population standard deviation, and sample size (uses normal distribution). Note: Standard error is the standard deviation divided by the square root of the sample size.	norm.interval (conf_level, sample_mean, standard_error)	Statistical Inference and Confidence Intervals
t.interval()	Calculates confidence interval for the mean when population standard deviation is unknown, given sample mean, sample standard deviation, and sample size (uses t-distribution). Note, standard error is the standard deviation divided by the square root of the sample size.	t.interval (conf_level, degrees_freedom, sample_mean, standard_error)	Statistical Inference and Confidence Intervals
proportion_confint()	Calculates confidence interval for a proportion (uses normal distribution)	proportion_confint (success, sample_size, alpha)	Statistical Inference and Confidence Intervals
ttest_1samp()	Returns the value of the test statistic and the two-tailed p-value for a one-sample hypothesis test using the t-distribution	ttest_1samp (data_array, null_hypothesis_mean)	Hypothesis Testing
ttest_ind_from_stats()	Returns the value of the test statistic and the two-tailed p-value for a two-sample hypothesis test using the t-distribution	ttest_ind_from_stats (sample_mean1, sample_standard_deviation1, sample_size1, sample_mean2, sample_standard_deviation2, sample_size2)	Hypothesis Testing
np.array()	Creates a numerical array from a list-like object	np.array(object)	Correlation and Linear Regression Analysis
pearsonr()	Calculates the value of the Pearson correlation coefficient r	pearsonr (x_data, y_data)	Correlation and Linear Regression Analysis
linregress()	Generates a linear regression model and provides slope, y-intercept, and other regression-related output	linregress (x_data, y_data)	Correlation and Linear Regression Analysis
f_oneway()	Returns both the F test statistic and the p-value for the one-way ANOVA hypothesis test	f_oneway (Array1, Array2, Array3, …)	Analysis of Variance (ANOVA)
Time Series and Forecasting
plot()	Generates a time series plot	plot(dataframe)	Introduction to Time Series Analysis
rolling()	Provides rolling window calculations	rolling (window=window)	Time Series Forecasting Methods
mean()	Computes the average of a dataset	mean(dataset)	Time Series Forecasting Methods
diff()	Computes the first-order difference of data in a window	diff(dataframe)	Time Series Forecasting Methods
plot_acf()	Plots the ACF (autocorrelation function) for a time series, up to lag L	Plot_acf (time_series_data, lags=L)	Time Series Forecasting Methods
STL()	Decomposes a time series with known period P into its components	STL (time_series_data, period=P)	Time Series Forecasting Methods
ewm()	Performs exponential moving average (EMA) smoothing	ewm(dataframe)	Time Series Forecasting Methods
adfuller()	Performs the Augmented Dickey-Fuller (ADF) test, which is a statistical test for checking the stationarity of a time series	adfuller (time_series_data)	Time Series Forecasting Methods
ARIMA()	Fits an ARIMA(p, d, q) (AutoRegressive Integrated Moving Average) model to time series data	ARIMA (time_series_data, order=(p, d, q))	Time Series Forecasting Methods
Decision-Making Using Machine Learning Basics
LogisticRegression()	Creates a logistic regression model	LogisticRegression()	Classification Using Machine Learning
model.fit()	Trains a machine learning model on a given dataset	model.fit (feature_matrix, target_vector)	Classification Using Machine Learning
KMeans()	Sets up a k-means clustering model (Use model.fit() to fit the model to a dataset.)	KMeans(n_clusters=k)	Classification Using Machine Learning
DBSCAN()	Sets up a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) model (Use model.fit() to fit the model to a dataset.)	DBSCAN(options)	Classification Using Machine Learning
confusion_matrix()	Used to visualize the performance of a model by comparing actual and predicted values	confusion_matrix (target_values, predicted_values)	Classification Using Machine Learning
LinearRegression()	Fits a linear regression model to data	LinearRegression() .fit(feature_matrix, target_vector)	Machine Learning in Regression Analysis
predict()	Used on trained machine learning models to generate predictions for new data points	predict(feature_matrix)	Machine Learning in Regression Analysis
DecisionTreeClassifier()	Sets up a decision tree model (Use model.fit() to fit the model to a dataset.)	DecisionTreeClassifier (options)	Decision Trees
ens.RandomForestRegressor()	Sets up a random forest model (Use model.fit() to fit the model to a dataset.)	ens.RandomForestRegressor (options)	Other Machine Learning Techniques
GaussianNB()	Set up a Naïve Bayes classification model (Use model.fit() to fit the model to a dataset.)	GaussianNB()	Other Machine Learning Techniques
Deep Learning and Artificial Intelligence (AI) Basics
Perceptron()	Sets up a perceptron model (Use model.fit() to fit the model to a dataset.)	Perceptron()	Introduction to Neural Networks
train_test_split()	Splits dataset randomly into train and test subsets, using a proportion of P of the data for the test set	train_test_split (input_data_arrays, target_data, test_size=P)	Introduction to Neural Networks
StandardScaler()	Used to standardize features by removing the mean and scaling to unit variance	StandardScaler()	Introduction to Neural Networks
accuracy_score()	Calculates the accuracy of a classification model as the ratio of the number of correct predictions to the total number of predictions	accuracy_score (y_true, y_predicted)	Introduction to Neural Networks
scaler.fit_transform()	Fits a scaler to the data and then transforms the data according to the fitted scaler	scaler.fit_transform(array)	Introduction to Neural Networks
scaler.transform()	Applies a previously fitted scaler to new data	scaler.transform(array)	Introduction to Neural Networks
tf.keras.Sequential()	Creates a linear stack of layers for building a neural network model	tf.keras.Sequential (layers, additional options)	Backpropagation
model.compile()	Used to configure the learning process of a neural network model before training	model.compile (optimizer, loss, metrics)	Backpropagation
Visualizing Data
boxplot()	Creates a box-and-whisker plot	plt.boxplot(array)	Encoding Univariate Data
hist()	Creates a histogram	plt.hist (array)	Encoding Univariate Data
plot()	Creates 2D line plots such as a time series graph	plt.plot (x_data, y_data)	Graphing Probability Distributions
bar()	Creates a bar chart	plt.bar (x_array, heights)	Graphing Probability Distributions
imshow()	Displays an image on a 2D regular raster, such as a heatmap	plt.imshow(array)	Geospatial and Heatmap Data Visualization Using Python
heatmap()	Creates a heatmap visualization	sns.heatmap(array)	Geospatial and Heatmap Data Visualization Using Python
colorbar()	Adds a colormap to a figure	plt.colorbar()	Multivariate and Network Data Visualization Using Python
corr()	Calculates the pairwise correlations of columns in a DataFrame	dataframe.corr()	Multivariate and Network Data Visualization Using Python
add.subplot()	Adds a subplot to a figure stored in fig	fig.add.subplot (position)	Multivariate and Network Data Visualization Using Python
ax.scatter()	Creates a scatterplot	ax.scatter (x_data, y_data)	Multivariate and Network Data Visualization Using Python
Reporting Results
plot_tree()	Creates a visualization of a decision tree	plot_tree (estimator, feature_names)	Validating Your Model
DataFrame.info()	Provides a concise summary of a DataFrame's structure and content	DataFrame.info()	Validating Your Model
DataFrame.drop()	Removes rows or columns from a DataFrame	DataFrame.drop (labels, axis=rows_columns)	Validating Your Model
score()	Evaluates the performance of a trained model on a given dataset	model.score (feature_matrix, true_labels)	Validating Your Model
dt.get_depth()	Retrieves the depth of the decision tree, dt	dt.get_depth()	Validating Your Model
cross_val_score()	Evaluates a model's performance using cross-validation	cross_val_score (estimator, feature_matrix, target_variable)	Validating Your Model
GridSearchCV ()	Search for the best parameters for a specified estimator, with k-fold cross-validation	GridSearchCV (estimator, parameters, k)	Validating Your Model

Table D1

D | Appendix D: Review of Python Functions