Dr. Shaun V. Ault; Dr. Soohyun Nam Liao; Larry Musolino

Project A: Comparing k-Means and DBScan

In this chapter, there were two datasets used to illustrate k-means clustering and DBScan, FungusLocations.csv and DBScanExample.csv. Run each algorithm on the other dataset. In other words, use DBScan to classify the points of FungusLocations.csv and k-means to cluster the points of DBScanExample.csv. Discuss the results, comparing the performance of both algorithms on each dataset. If there were any issues with the classifications, discuss possible reasons why those issues came up.

Project B: Building a Decision Tree to Predict College Completion

Build a decision tree classifier based on the dataset CollegeCompletionData.csv to classify students as likely to complete college (or not) based on their GPA and in-state status. Experiment with different ratios of training to testing data and different pruning techniques to find a model with the best accuracy. Generate some data points with random GPAs and in-state statuses and use your model to predict college completion on your new data.

Project C: Predicting Outcomes in Liver Disease Patients

Analyze the dataset cirrhosis.csv to predict labels $D$ , $C$ , and $C L$ based on various combinations of the feature columns, using (a) random forest and (b) multiple logistic regression. Compare the accuracy of your models with the Gaussian naïve Bayes classifier that was produced in Gaussian Naive Bayes for Continuous Probabilities.

Group Project

Project A: Comparing k-Means and DBScan

Project B: Building a Decision Tree to Predict College Completion

Project C: Predicting Outcomes in Liver Disease Patients