Highlights from this chapter include:
 Data science is a multidisciplinary field that combines collection, processing, and analysis of large volumes of data to extract insights and drive informed decisionmaking.
 The data science life cycle is the framework followed by data scientists to complete a data science project.
 The data science life cycle includes 1) data acquisition, 2) data exploration, 3) data analysis, and 4) reporting.
 Google Colaboratory is a cloudbased Jupyter Notebook environment that allows programmers to write, run, and share Python code online.
 NumPy (Numerical Python) is a Python library that provides support for efficient numerical operations on large, multidimensional arrays and serves as a fundamental building block for data analysis in Python.
 NumPy implements an
ndarray
object that allows the creation of multidimensional arrays of homogeneous data types and efficient data processing.  NumPy provides functionalities for mathematical operations, array manipulation, and linear algebra operations.
 Pandas is an opensource Python library used for data cleaning, processing, and analysis.
 Pandas provides Series and DataFrame data structures, data processing functionality, and integration with other libraries.
 Exploratory Data Analysis (EDA) is the task of analyzing data to gain insights, identify patterns, and understand the underlying structure of the data.
 A feature is an individual variable or attribute that is calculated from raw data in a dataset.
 Data indexing can be used to select and access specific rows and columns.
 Data slicing refers to selecting a subset of rows and/or columns from a DataFrame.
 Data filtering involves selecting rows or columns based on certain conditions.
 Missing values in a dataset can occur when data are not available or were not recorded properly.
 Data visualization has a crucial role in data science for understanding the data.
 Different types of visualizations include bar plot, line plot, scatter plot, histogram plot, and box plot.
 Several Python data visualization libraries exist that offer a range of capabilities and features to create different plot types. These libraries include Matplotlib, Seaborn, and Plotly.
 The conventional aliases for importing NumPy, Pandas, and Matplotlib.pyplot are
np
,pd
, andplt
, respectively.
At this point, you should be able to write programs to create data structures to store different datasets and explore and visualize datasets.
Function  Description 


Creates an 

Creates an array of zeros. 

Creates an array of ones. 

Creates an array of random numbers with 

Creates an array from a CSV file. 

Creates a DataFrame from a list, dictionary, or an array. 

Creates a DataFrame from a CSV file. 

Returns the first few rows of a DataFrame. 

Returns the last few rows of a DataFrame. 

Provides a summary of the DataFrame, including the column names, data types, and the number of nonNullvalues. 

Generates the column count, mean, standard deviation, minimum, maximum, and quartiles. 

Counts the occurrences of unique values in a column and presents them in descending order. 

Returns an array of unique values in a column. 

Allows for accessing data in a DataFrame using row/column labels. 

Allows for accessing data in a DataFrame using row/column integerbased indexes. 

Selects only the rows that meet the given 

Slices using label ranges. 

Slices rows that are in the list 

Returns a Boolean array with Boolean values representing whether each entry has been 

Replaces 

Removes all rows containing a 

Takes in two inputs, 

Takes in two inputs, 

Takes in two inputs, 

Takes in one input, 

Takes in one input, 