Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo

Project A: Comparing k-Means and DBScan

In this chapter, there were two datasets used to illustrate k-means clustering and DBScan, FungusLocations.csv and DBScanExample.csv. Run each algorithm on the other dataset. In other words, use DBScan to classify the points of FungusLocations.csv and k-means to cluster the points of DBScanExample.csv. Discuss the results, comparing the performance of both algorithms on each dataset. If there were any issues with the classifications, discuss possible reasons why those issues came up.

Project B: Building a Decision Tree to Predict College Completion

Build a decision tree classifier based on the dataset CollegeCompletionData.csv to classify students as likely to complete college (or not) based on their GPA and in-state status. Experiment with different ratios of training to testing data and different pruning techniques to find a model with the best accuracy. Generate some data points with random GPAs and in-state statuses and use your model to predict college completion on your new data.

Project C: Predicting Outcomes in Liver Disease Patients

Analyze the dataset cirrhosis.csv to predict labels DD, CC, and CLCL based on various combinations of the feature columns, using (a) random forest and (b) multiple logistic regression. Compare the accuracy of your models with the Gaussian naïve Bayes classifier that was produced in Gaussian Naive Bayes for Continuous Probabilities.

Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.