Chapter Outline
Data science is not just about analyzing datasets. Communicating the findings effectively is an essential part of the process aimed at promoting understanding and retention of key information. The ability to convey complex information to various kinds of audiences in a clear, consumable, and accessible manner is a vital skill for any data scientist. This chapter is designed to equip you with the skills necessary to create informative reports that combine analytical rigor and clarity. You will learn how to identify and engage your audience, document the intricate methods that underpin your data science projects, and determine the most compelling ways to present your data, analysis, and findings.
Throughout the chapter, we will be utilizing a common dataset, diabetes_data.csv, to demonstrate ways in which Python can help to create elements critical to a clear and informative report. These data are courtesy of Dr. John Schorling, Department of Medicine, University of Virginia School of Medicine (rdrr.io/cran/ROCit/man/Diabetes.html). The dataset was used in a study published in 1997 to determine the prevalence of coronary heart disease (CHD) risk factors (including factors associated with diabetes) among a population-based sample of Black residents of Virginia. CHD is one of the most common causes of death among Black adults (Willems et al., 1997). The dataset contains information on 403 subjects out of a total of 1,046 diabetes patients interviewed, and it is readily accessible.