Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

15.1 Introduction to data science

1.
c. Data acquisition is the first stage of the data science life cycle. Data can be collected by the data scientist or gathered previously and provided to the data scientist.
2.
b. The data science life cycle has four stages: 1) data acquisition, 2) data exploration, 3) data analysis, and 4) reporting.
3.
c. The data exploration stage includes data cleaning and visualization.
4.
a. Python is the most popular programming language in data science.
5.
b. NumPy is a Python library used in numerical data analysis.
6.
a. Google Colaboratory is a Google document that can be shared with other Google accounts for viewing or editing the content.

15.2 NumPy

1.
c. To create an m by n matrix of all zeros, np.zeros((m,n)) can be used.
2.
a. ndarray is a data type supported by NumPy. When printing the type of an ndarray object, <'numpy.ndarray'> is printed.
3.
d. NumPy array is optimized for computational and memory efficiency while also offering array-oriented computation.
4.
b. 2 * arr multiplies all elements by 2 resulting in:
[[2 4]
[6 8]]
5.
c. np.array([[1, 2], [1, 2], [1, 2]]) creates an ndarray of 3 by 2. The T operator transposes the array. The transpose operator takes an array of m by n and converts the array to an n by m array. The result of applying the transpose operator on the given array is a 2 by 3 array.
6.
a. The result of element-wise multiplication between arr1 and arr2 is:
[[1 0]
[0 4]]

15.3 Pandas

1.
a. Series is a Pandas data structure representing one-dimensional labeled data.
2.
a. A DataFrame object can be considered a collection of one-dimensional labeled objects represented by Series objects.
3.
b. The Pandas DataFrame can store columns of varying data types, while NumPy only supports numeric data types.
4.
c. The function head() returns a DataFrame's top rows. If an argument is not specified, the default number of returned rows is five.
5.
c. The unique() function when applied to a column returns the unique values (rows) in the given column.
6.
a. When the function describe() is applied to a DataFrame, summary statistics of numerical columns will be generated.

15.4 Exploratory data analysis

1.
a. The element in the first row and the first column is a.
2.
c. The element in the second row and the second column is 20 .
3.
a. The element at row label 2 and column label A is c.
4.
b. df.loc[0:2, "A"] returns rows 0 to 2 (inclusive) of column A.
5.
c. loc[1] returns the row with label 1 , which corresponds to the second row of the DataFrame.
6.
b. iloc[:, 0] selects all rows corresponding to the column index 0, which equals returning the first column.
7.
a. The condition returns all rows where the value in the column with label A is divisible by 2.
8.
b. isnull() returns a Boolean DataFrame representing whether data entries are Null or not.
9.
a. fillna() replaces all Null values with the provided value passed as an argument.
10.
c. The function sum() is applied to each column separately and sums up values in columns. The result is the number of Null values in each column.

15.5 Data visualization

1.
b. A histogram is used to plot the distribution of continuous variables.
2.
c. A box plot is used to show the distribution of a continuous variable, along with visualizing outliers, minimum, maximum, and quartiles.
3.
a. A line plot is used to show trends and changes over time between two variables.
4.
c. The function call plt.scatter(x, y) creates a scatter plot based on the data stored in the variables x and y.
5.
b. The given code plots a bar plot with four bars representing the given categories. The height of the bars corresponds to the values stored in the values variable.
6.
b. Plotly is a library that creates interactive visualizations.
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/introduction-python-programming/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/introduction-python-programming/pages/1-introduction
Citation information

© Mar 15, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

This book utilizes the OpenStax Python Code Runner. The code runner is developed by Wiley and is All Rights Reserved.