2
.
For each dataset, define the type of the data based on following criteria and explain why:
- Numeric vs. categorical
- If it is numeric, continuous vs. discrete; if it is categorical, nominal vs. ordinal
a.
“artist_count” attribute of Spotify dataset
b.
“mode” attribute of Spotify dataset
c.
“key” attribute of Spotify dataset
d.
the second column in CancerDoc dataset
3
.
For each dataset, identify the type of the dataset—structured vs. unstructured. Explain why.
a.
Spotify dataset
b.
CancerDoc dataset
6
.
Draw scatterplot between bpm (x-axis) and danceability (y-axis) of the Spotify dataset using:
a.
Python
Matplotlib
b.
A spreadsheet program such as MS Excel or Google Sheets (Hint: Search “Scatterplot” on Help.)
7
.
Regenerate the scatterplot of the Spotify dataset, but with a custom title and x-/y-axis label. The title should be “BPM vs. Danceability.” The x-axis label should be titled “bpm” and range from the minimum to the maximum bpm value. The y-axis label should be titled “danceability” and range from the minimum to the maximum Danceability value.
a.
Python
Matplotlib
(Hint: DataFrame.min()
and DataFrame.max()
methods return min and max values of the DataFrame. You can call these methods upon a specific column of a DataFrame as well. For example, if a DataFrame is named df
and has a column named “col1”, df[“col1”].min()
will return the minimum value of the “col1” column of df
.)
b.
A spreadsheet program such as MS Excel or Google Sheets (Hint: Calculate the minimum and maximum value of each column somewhere else first, then simply use the value when editing the scatterplot.)
8
.
Based on the Spotify dataset, filter the following using Python
Pandas:
a.
Tracks whose artist is Taylor Swift
b.
Tracks that were sung by Taylor Swift and released earlier than 2020