Dr. Shaun V. Ault; Dr. Soohyun Nam Liao; Larry Musolino

1 .

For each dataset, list the attributes.

a.

Spotify dataset

b.

CancerDoc dataset

2 .

For each dataset, define the type of the data based on following criteria and explain why:

Numeric vs. categorical
If it is numeric, continuous vs. discrete; if it is categorical, nominal vs. ordinal

a.

“artist_count” attribute of Spotify dataset

b.

“mode” attribute of Spotify dataset

c.

“key” attribute of Spotify dataset

d.

the second column in CancerDoc dataset

3 .

For each dataset, identify the type of the dataset—structured vs. unstructured. Explain why.

a.

Spotify dataset

b.

CancerDoc dataset

4 .

For each dataset, list the first data entry.

a.

Spotify dataset

b.

CancerDoc dataset

5 .

Open the WikiHow dataset (ch1-wikiHow.json) and list the attributes of the dataset.

6 .

Draw scatterplot between bpm (x-axis) and danceability (y-axis) of the Spotify dataset using:

a.

Python Matplotlib

b.

A spreadsheet program such as MS Excel or Google Sheets (Hint: Search “Scatterplot” on Help.)

7 .

Regenerate the scatterplot of the Spotify dataset, but with a custom title and x-/y-axis label. The title should be “BPM vs. Danceability.” The x-axis label should be titled “bpm” and range from the minimum to the maximum bpm value. The y-axis label should be titled “danceability” and range from the minimum to the maximum Danceability value.

a.

Python Matplotlib (Hint: DataFrame.min() and DataFrame.max() methods return min and max values of the DataFrame. You can call these methods upon a specific column of a DataFrame as well. For example, if a DataFrame is named df and has a column named “col1”, df[“col1”].min() will return the minimum value of the “col1” column of df.)

b.

A spreadsheet program such as MS Excel or Google Sheets (Hint: Calculate the minimum and maximum value of each column somewhere else first, then simply use the value when editing the scatterplot.)

8 .

Based on the Spotify dataset, filter the following using Python Pandas:

a.

Tracks whose artist is Taylor Swift

b.

Tracks that were sung by Taylor Swift and released earlier than 2020

Critical Thinking