Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Principles of Data Science

9.2 Encoding Data That Change Over Time

Principles of Data Science9.2 Encoding Data That Change Over Time

Learning Outcomes

By the end of this section, you should be able to:

  • 9.2.1 Create and interpret labeled graphs to visually identify trends in data over time.
  • 9.2.2 Use Python to generate various data visualizations for time series data.

Researchers are frequently interested in analyzing and interpreting data over time. Recall from Time and Series Forecasting that any set of data that consists of numerical measurements of the same variable collected and organized according to regular time intervals may be regarded as time series data. A time series graph allows data scientists to visually identify trends in data over time. For example, a college administrator is very interested in tracking enrollment data from semester to semester. A marketing manager is interested in tracking quarterly revenue for a business. An investor might be interested in tracking stock prices on a monthly basis.

In many cases, a time series graph can reveal seasonal patterns to cycles, which is important information to help guide many business decisions, such as planning marketing campaigns or setting optimal inventory levels. Time series graphs also provide important data for forecasting methods, which attempts to predict future trends based on past data.

Refer to Examples of Time Series Data for the basics of visualizing time series data in Excel and Python. The Trend Curve and Trend-Cycle Component explores visualization of a trend curve. Here we’ll explore line charts, trend curves, and the use of Python for time series data visualization in more detail.

Line Charts and Trend Curves

A time series graph is used in many fields such as finance, health care, economics, etc. to visualize data measurements over time. To construct a time series graph, the horizontal axis is used to plot the time increments, and the vertical axis is used to plot the values of the variable that we are measuring, such as revenue, enrollment, inventory level, etc. By doing this, we make each point on the graph correspond to a point in time and a measured quantity. The points on the graph are typically connected by straight lines in the order in which they occur. Once the chart is created, a researcher can identify trends or patterns in the data to help guide decision-making.

Example 9.5

Problem

The following data plots the enrollment at a community college over an eight-year time period. Create a times series graph of the data and comment on any trends (Table 9.3).

Year Enrollment
1 8641
2 9105
3 9361
4 9895
5 9230
6 8734
7 8104
8 7743
Table 9.3 Community College Enrollment

Using Python for Times Series Data Visualization

Data visualization is a very important part of data science, and Python has many built-in graphing capabilities through a package called matplotlib. To import this package into a Python program, use the command:

import matplotlib.pyplot as plt

Matplotlib contains functions such as plot that can be used to generate time series graphs.

After defining the (x,y)(x,y) data to be graphed, the Python function plt.plot(x, y) can be used to generate the time series chart.

Example 9.6

Problem

Use the plot function in Python to create a time series graph for the dataset shown in Example 9.5.

Guidelines for Creating Effective Visualizations

You have likely come across poorly constructed graphs that are difficult to interpret or possibly misleading in some way. However, if you follow certain guidelines when creating your visualizations, you will be sure your visualizations are effective and represent the data fairly.

The first rule is to ensure your graph or chart is clear and easy to understand. You may be familiar with the type of data collected, but your audience may not be. Provide a title and label each axis, including units where appropriate. Provide a data source where appropriate. Use scales for the axes and avoid distortion (i.e., don’t intentionally adjust the scale to produce a desired effect in the graph).

The use of colors to differentiate data can be an important part of graphs and displays. Overall, the use of color can make the visualization more appealing and engaging for the viewer. However, there are some important things to keep in mind when using color. Be sure to apply color consistently to represent the same categories. Ensure that there is contrast between different colors to make them easily distinguishable. For example, if you used various shades of blue for the sectors in a pie chart, it might be difficult for the reader to distinguish among them.

Keep in mind that some viewers may be color-blind, so it’s a good rule of thumb to use color palettes that are accessible to color-blind individuals. The most basic rule for a color-blind palette is to avoid combining green and red.

Exploring Further

Resources for Creating Color-Blind Palettes

You should also be aware that colors can have different meanings in different cultures. Keep the number of colors used in your visual to a minimum. One rule of thumb is to limit the number of colors to three to five to avoid a color palette that is too confusing.

Create the graph with your audience in mind. If the graph is intended for a general audience, then the graph should be as simple as possible and easy to interpret, perhaps with pie charts or bar charts. Don’t include too many categories in your pie chart; if you have more than about five or six categories, you may want to consider a different type of chart. Consider combining categories where appropriate for simplicity. (Sometimes an “other” category can be used to combine categories with very few counts.) If the graph is intended for a more technical audience or for more complicated types of analyses, then a more detailed display such as a scatterplot with regression line and confidence bounds may be appropriate. (Visualizing data with scatterplots is covered in Multivariate and Network Data Visualization Using Python.)

Selecting the Best Type of Visualization

When creating graphs and displays to visualize data, the data scientist must consider what type of graph will work best with the collected data and will best convey the message that the data reveals. This section provides some guidance on the optimal type of graph for different scenarios.

Before we discuss how to choose the best visualization, it is important to consider the type of data being presented. Recall that data can be divided into the broad categories of qualitative data (describes categories or groups) and quantitative data (numeric data) and that quantitative data is further classified as discrete (usually counts) and continuous (usually measurements that can take on any value within a range). You first need to be clear about the type of data being illustrated.

Then, carefully consider the goal of the visualization—that is, what are you trying to accomplish with the graph or display? What is your data telling you? How is it distributed, and what patterns or relationships do you want to highlight? You might be interested in comparing different groups of categories, or you may be interested in showing how data is distributed over some interval. Recall from the previous discussion of correlation and regression that we are often interested in showing correlations or relationships between quantitative variables. Table 9.4 offers some suggestions for picking the most appropriate graphing method based on your goals and type of data.

Goal of Visualization Type of Data Visualization Methods to Consider
Show a comparison of different groups or categories Qualitative or Quantitative
  • Bar charts are useful for comparing different groups of categories
  • Column chart
Show correlations or relationships between quantitative variables Quantitative
  • Scatterplot
  • Time series graph
  • Heatmap
Show how a whole is divided into parts Quantitative
  • Pie chart
Show how data is distributed over some interval Quantitative
  • Histogram
  • Boxplot
Table 9.4 Summary of Visualization Methods Based on Goals and Data
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.