1
.
Discuss how different ratios of training versus testing data can affect the model in terms of underfitting and overfitting. How does the testing set provide a means to identify issues with underfitting and overfitting?
2
.
A university admissions office would like to use a multiple linear regression model with students’ high school GPA and scores on both the SAT and ACT (standardized tests) as input variables to predict whether the student would eventually graduate university if admitted. Assuming the following statements are all accurate, which statement would be a reason not to use multiple linear regression?
-
Students' SAT and ACT scores are often highly correlated with one another.
-
GPA scores are measured on a different scale than either SAT or ACT scores.
-
Scores of 0 are impossible to obtain on the SAT or ACT.
-
Students can have high GPA but sometimes not do well on standardized tests like the SAT or ACT.
3
.
Using the data about words found in news articles from Example 6.12, classify an article that contains all three words, today, disaster, and police, as real or fake.