Dr. Shaun V. Ault; Dr. Soohyun Nam Liao; Larry Musolino

Learning Outcomes

By the end of this section, you should be able to:

9.2.1 Create and interpret labeled graphs to visually identify trends in data over time.
9.2.2 Use Python to generate various data visualizations for time series data.

Researchers are frequently interested in analyzing and interpreting data over time. Recall from Time and Series Forecasting that any set of data that consists of numerical measurements of the same variable collected and organized according to regular time intervals may be regarded as time series data. A time series graph allows data scientists to visually identify trends in data over time. For example, a college administrator is very interested in tracking enrollment data from semester to semester. A marketing manager is interested in tracking quarterly revenue for a business. An investor might be interested in tracking stock prices on a monthly basis.

In many cases, a time series graph can reveal seasonal patterns to cycles, which is important information to help guide many business decisions, such as planning marketing campaigns or setting optimal inventory levels. Time series graphs also provide important data for forecasting methods, which attempts to predict future trends based on past data.

Refer to Examples of Time Series Data for the basics of visualizing time series data in Excel and Python. The Trend Curve and Trend-Cycle Component explores visualization of a trend curve. Here we’ll explore line charts, trend curves, and the use of Python for time series data visualization in more detail.

Line Charts and Trend Curves

A time series graph is used in many fields such as finance, health care, economics, etc. to visualize data measurements over time. To construct a time series graph, the horizontal axis is used to plot the time increments, and the vertical axis is used to plot the values of the variable that we are measuring, such as revenue, enrollment, inventory level, etc. By doing this, we make each point on the graph correspond to a point in time and a measured quantity. The points on the graph are typically connected by straight lines in the order in which they occur. Once the chart is created, a researcher can identify trends or patterns in the data to help guide decision-making.

Example 9.5

Problem

The following data plots the enrollment at a community college over an eight-year time period. Create a times series graph of the data and comment on any trends (Table 9.3).

Year	Enrollment
1	8641
2	9105
3	9361
4	9895
5	9230
6	8734
7	8104
8	7743

Table 9.3 Community College Enrollment

Solution

To create a time series graph for this dataset, start off by creating the horizontal axis by plotting the time increments, from years 1 to 8. For the vertical axis, used a numeric scale extending from 4,000 to 12,000 since this covers the span of the numeric enrollment data.

Then for each row in the table, plot the corresponding $(x, y)$ point on the graph. By doing this, we make each point on the graph correspond to a point in time and a measured quantity. For example, the first row in the table shows the $(x, y)$ datapoint of (1, 8641). To plot this point, identify the location that corresponds to Year 1 on the horizontal axis and an enrollment value of 8641 on the vertical axis. At the intersection of the horizontal location and vertical location, plot a single point to represent this $(x, y)$ data value. Repeat this process for each row in the table. Since there are eight rows in the table, we expect to see eight points on the time series graph. Finally, the points on the graph are typically connected by straight lines in the order in which they occur (see the Python output from Example 9.6).

Using Python for Times Series Data Visualization

Data visualization is a very important part of data science, and Python has many built-in graphing capabilities through a package called matplotlib. To import this package into a Python program, use the command:

import matplotlib.pyplot as plt

Matplotlib contains functions such as plot that can be used to generate time series graphs.

After defining the $(x, y)$ data to be graphed, the Python function plt.plot(x, y) can be used to generate the time series chart.

Example 9.6

Problem

Use the plot function in Python to create a time series graph for the dataset shown in Example 9.5.

Solution

In the following Python code, note that the plt.plot with marker specifies that points should be plotted at the various $(x, y)$ points. In addition, plt.ylim specifies the range for the y-axis.

Python Code

    # import the matplotlib library
    import matplotlib.pyplot as plt
    import matplotlib.ticker as ticker
    
    # Define the x-data
    x = [1, 2, 3, 4, 5, 6, 7, 8]
    
    # Define the y-data
    y = [8641, 9105, 9361, 9895, 9230, 8734, 8104, 7743]
    
    # Use the plot function to generate a time series graph
    plt.plot(x, y, marker='o', linestyle='-')
    
    # Set the scale for the y-axis
    plt.ylim(6000, 11000)  # This sets the y-axis range from 6000 to 11000
    
    # Define a function to format the ticks with commas as thousands separators
    def format_ticks(value, tick_number):
        return f'{value:,.0f}'
    
    # Apply the custom formatter to the y-axis
    plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(format_ticks))
    
    # Add labels to the x and y axes by using xlabel and ylabel functions
    plt.xlabel("Year")
    plt.ylabel("Enrollment")
    
    # Add a title using the title function
    plt.title("Enrollment at a Community College")
    
    # Show the plot
    plt.show()

The resulting output will look like this:

A line graph labeled enrollment at a community college. The X axis representing years ranges from 1 to 8 and the Y axis for enrollment ranges from 6,000 to 12,000. The line created by the 8 data points indicates that the enrollment started at about 8,600, gradually increased over years 1 to 4 to a peak of 9,800, then began to decrease from years 4 to 8 to a low around 7,700.

The graph indicates that the enrollment gradually increased over years 1 to 4 but then began to decrease from years 4 to 8.

Guidelines for Creating Effective Visualizations

You have likely come across poorly constructed graphs that are difficult to interpret or possibly misleading in some way. However, if you follow certain guidelines when creating your visualizations, you will be sure your visualizations are effective and represent the data fairly.

The first rule is to ensure your graph or chart is clear and easy to understand. You may be familiar with the type of data collected, but your audience may not be. Provide a title and label each axis, including units where appropriate. Provide a data source where appropriate. Use scales for the axes and avoid distortion (i.e., don’t intentionally adjust the scale to produce a desired effect in the graph).

The use of colors to differentiate data can be an important part of graphs and displays. Overall, the use of color can make the visualization more appealing and engaging for the viewer. However, there are some important things to keep in mind when using color. Be sure to apply color consistently to represent the same categories. Ensure that there is contrast between different colors to make them easily distinguishable. For example, if you used various shades of blue for the sectors in a pie chart, it might be difficult for the reader to distinguish among them.

Keep in mind that some viewers may be color-blind, so it’s a good rule of thumb to use color palettes that are accessible to color-blind individuals. The most basic rule for a color-blind palette is to avoid combining green and red.

Exploring Further

Resources for Creating Color-Blind Palettes

The Dataviz Best Practices blog includes tips on creating The Best Charts for Colorblind Viewers.
10 Essential Guidelines for Colorblind Friendly Design offers more information on types of color-blindness and ways to make your visual design accessible for and appealing to color-blind users.

You should also be aware that colors can have different meanings in different cultures. Keep the number of colors used in your visual to a minimum. One rule of thumb is to limit the number of colors to three to five to avoid a color palette that is too confusing.

Create the graph with your audience in mind. If the graph is intended for a general audience, then the graph should be as simple as possible and easy to interpret, perhaps with pie charts or bar charts. Don’t include too many categories in your pie chart; if you have more than about five or six categories, you may want to consider a different type of chart. Consider combining categories where appropriate for simplicity. (Sometimes an “other” category can be used to combine categories with very few counts.) If the graph is intended for a more technical audience or for more complicated types of analyses, then a more detailed display such as a scatterplot with regression line and confidence bounds may be appropriate. (Visualizing data with scatterplots is covered in Multivariate and Network Data Visualization Using Python.)

Selecting the Best Type of Visualization

When creating graphs and displays to visualize data, the data scientist must consider what type of graph will work best with the collected data and will best convey the message that the data reveals. This section provides some guidance on the optimal type of graph for different scenarios.

Before we discuss how to choose the best visualization, it is important to consider the type of data being presented. Recall that data can be divided into the broad categories of qualitative data (describes categories or groups) and quantitative data (numeric data) and that quantitative data is further classified as discrete (usually counts) and continuous (usually measurements that can take on any value within a range). You first need to be clear about the type of data being illustrated.

Then, carefully consider the goal of the visualization—that is, what are you trying to accomplish with the graph or display? What is your data telling you? How is it distributed, and what patterns or relationships do you want to highlight? You might be interested in comparing different groups of categories, or you may be interested in showing how data is distributed over some interval. Recall from the previous discussion of correlation and regression that we are often interested in showing correlations or relationships between quantitative variables. Table 9.4 offers some suggestions for picking the most appropriate graphing method based on your goals and type of data.

Goal of Visualization	Type of Data	Visualization Methods to Consider
Show a comparison of different groups or categories	Qualitative or Quantitative	Bar charts are useful for comparing different groups of categories Column chart
Show correlations or relationships between quantitative variables	Quantitative	Scatterplot Time series graph Heatmap
Show how a whole is divided into parts	Quantitative	Pie chart
Show how data is distributed over some interval	Quantitative	Histogram Boxplot

Table 9.4 Summary of Visualization Methods Based on Goals and Data

9.2 Encoding Data That Change Over Time

Learning Outcomes

Line Charts and Trend Curves

Problem

Solution

Using Python for Times Series Data Visualization

Problem

Solution

Guidelines for Creating Effective Visualizations

Resources for Creating Color-Blind Palettes

Selecting the Best Type of Visualization