Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

1 .
For each dataset, list the attributes.
a.
Spotify dataset
b.
CancerDoc dataset
2 .
For each dataset, define the type of the data based on following criteria and explain why:
  • Numeric vs. categorical
  • If it is numeric, continuous vs. discrete; if it is categorical, nominal vs. ordinal
a.
“artist_count” attribute of Spotify dataset
b.
“mode” attribute of Spotify dataset
c.
“key” attribute of Spotify dataset
d.
the second column in CancerDoc dataset
3 .
For each dataset, identify the type of the dataset—structured vs. unstructured. Explain why.
a.
Spotify dataset
b.
CancerDoc dataset
4 .
For each dataset, list the first data entry.
a.
Spotify dataset
b.
CancerDoc dataset
5 .
Open the WikiHow dataset (ch1-wikiHow.json) and list the attributes of the dataset.
6 .
Draw scatterplot between bpm (x-axis) and danceability (y-axis) of the Spotify dataset using:
a.
Python Matplotlib
b.
A spreadsheet program such as MS Excel or Google Sheets (Hint: Search “Scatterplot” on Help.)
7 .
Regenerate the scatterplot of the Spotify dataset, but with a custom title and x-/y-axis label. The title should be “BPM vs. Danceability.” The x-axis label should be titled “bpm” and range from the minimum to the maximum bpm value. The y-axis label should be titled “danceability” and range from the minimum to the maximum Danceability value.
a.
Python Matplotlib (Hint: DataFrame.min() and DataFrame.max() methods return min and max values of the DataFrame. You can call these methods upon a specific column of a DataFrame as well. For example, if a DataFrame is named df and has a column named “col1”, df[“col1”].min() will return the minimum value of the “col1” column of df.)
b.
A spreadsheet program such as MS Excel or Google Sheets (Hint: Calculate the minimum and maximum value of each column somewhere else first, then simply use the value when editing the scatterplot.)
8 .
Based on the Spotify dataset, filter the following using Python Pandas:
a.
Tracks whose artist is Taylor Swift
b.
Tracks that were sung by Taylor Swift and released earlier than 2020
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.