Dr. Shaun V. Ault; Dr. Soohyun Nam Liao; Larry Musolino

A trail in a forest with two individuals seated on three-wheeled mountain bikes using hand cycles. Two more individuals stand by a two-wheeled bike, with a measuring device attached to the wheel.

Figure 9.1 Data visualization techniques can involve collecting and analyzing geospatial data, such as measurements collected by the National Park Service during the High-Efficiency Trail Assessment Process (HETAP). (credit: modification of work "Trail Accessibility Assessments" by GlacierNPS/Flickr, Public Domain)

Chapter Outline

9.1 Encoding Univariate Data

9.2 Encoding Data That Change Over Time

9.3 Graphing Probability Distributions

9.4 Geospatial and Heatmap Data Visualization Using Python

9.5 Multivariate and Network Data Visualization Using Python

Data visualization serves as an effective strategy for detecting patterns, trends, and relationships within complex datasets. By representing data graphically, analysts can deduce relationships, dependencies, and outlier behaviors that might otherwise remain hidden in the raw numbers or tables. This visual exploration not only aids in understanding the underlying structure of the data but also helps to detect trends and correlations and leads data scientists to more informed decision-making processes.

Data visualization can enhance communication and comprehension across diverse audiences, including decision-makers in organizations, clients, and fellow data scientists. Through intuitive charts, graphs, and interactive dashboards, complex insights can be conveyed in a clear, accessible, and interactive manner. Effective data visualization transforms raw data into easier-to-understand graphical representations, which then helps to facilitate and share insights into analytical conclusions. A data scientist can help generate data presentations for management teams or other decision-makers, and these visualizations serve as a tool to aid and support decision-making.

In addition, data visualizations are instrumental in examining models, evaluating hypotheses, and refining analytical methodologies. By visually inspecting model outputs, data scientists can assess the accuracy and robustness of predictive algorithms, identify areas for improvement, and iteratively refine and improve their models. Visualization techniques such as histogram, time series charts, heatmaps, scatterplots, and geospatial maps offer important insights into model performance and predictive ability, which allows data scientists to refine and improve models with the result of improved predictive accuracy and reliability. Data visualization is a key component of the data science lifecycle and is an important tool used by data scientists and researchers.

Throughout this text we have discussed the concept of data visualization and have used technology (primarily Python) to help generate appropriate visual output such as graphs, charts, and maps. In Descriptive Statistics: Statistical Measurements and Probability Distributions, we introduced graphs of probability distributions such as binomial and normal distributions. Then, in Inferential Statistics and Regression Analysis, we introduced and worked with scatterplots for bivariate data. In Time and Series Forecasting, methods to visualize time series data and trend curves were reviewed. In this chapter, we review some of the basic visualization tools and extend the discussion to more advanced techniques such as heatmaps and visualizing geospatial and three-dimensional data. Reporting Results will provide strategies for including data visualization in executive reports and executive summaries.

In this chapter, we will review the basic techniques for data visualization and provide more detail on creating line charts and trend curves, with and without Python. We also explore techniques such as geospatial and multivariate data analysis that can uncover hidden dependencies or hidden relationships that are critical to researchers, data scientists, and statisticians.

Introduction

Chapter Outline