15.1 Introduction to data science
1.
c.
Data acquisition is the first stage of the data science life cycle. Data can be collected by the data scientist or gathered previously and provided to the data scientist.
2.
b.
The data science life cycle has four stages: 1) data acquisition, 2) data exploration, 3) data analysis, and 4) reporting.
6.
a.
Google Colaboratory is a Google document that can be shared with other Google accounts for viewing or editing the content.
15.2 NumPy
2.
a.
ndarray
is a data type supported by NumPy. When printing the type of an ndarray
object, <'numpy.ndarray'>
is printed.
3.
d.
NumPy array is optimized for computational and memory efficiency while also offering array-oriented computation.
5.
c.
np.array([[1, 2], [1, 2], [1, 2]])
creates an ndarray
of 3 by 2. The T
operator transposes the array. The transpose operator takes an array of m
by n
and converts the array to an n
by m
array. The result of applying the transpose operator on the given array is a 2 by 3 array.
15.3 Pandas
2.
a.
A DataFrame object can be considered a collection of one-dimensional labeled objects represented by Series objects.
3.
b.
The Pandas DataFrame can store columns of varying data types, while NumPy only supports numeric data types.
4.
c.
The function
head()
returns a DataFrame's top rows. If an argument is not specified, the default number of returned rows is five.
5.
c.
The
unique()
function when applied to a column returns the unique values (rows) in the given column.
6.
a.
When the function
describe()
is applied to a DataFrame, summary statistics of numerical columns will be generated.
15.4 Exploratory data analysis
6.
b.
iloc[:, 0]
selects all rows corresponding to the column index 0, which equals returning the first column.
10.
c.
The function
sum()
is applied to each column separately and sums up values in columns. The result is the number of Null
values in each column.
15.5 Data visualization
2.
c.
A box plot is used to show the distribution of a continuous variable, along with visualizing outliers, minimum, maximum, and quartiles.
4.
c.
The function call
plt.scatter(x, y)
creates a scatter plot based on the data stored in the variables x
and y
.
5.
b.
The given code plots a bar plot with four bars representing the given categories. The height of the bars corresponds to the values stored in the
values
variable.