Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Principles of Data Science

5.2 Components of Time Series Analysis

Principles of Data Science5.2 Components of Time Series Analysis

Learning Outcomes

By the end of this section, you should be able to:

  • 5.2.1 Define the trend curve of a time series.
  • 5.2.2 Define and identify seasonal and cyclic variations in time series data.
  • 5.2.3 Discuss how noise and error affect a time series.

Time series data may be decomposed, or broken up into several key components. There may be an overall direction (up or down) that the data seems to be following, also known as the trend. Cycles and seasonal patterns may be present. Finally, noise or random variation can often be separated from the rest of the data. Time series analysis begins with identifying and extracting the components that contribute to the structure of the data.

Summarizing, the four main components of a time series are defined as follows:

  1. Trend. The long-term direction of time series data in the absence of any other variation.
  2. Cyclic component. Large variations in the data that recur over longer time periods than seasonal fluctuations but not with a fixed frequency.
  3. Seasonal component (often referred to as seasonality). Variation in the data that occurs at fixed time periods, such as every year in the same months or every week on the same days.
  4. Noise/Random Error. Random, unpredictable variations that occur in the time series that cannot be attributed to any of the other components previously listed. (Note, the term “random” is not the same as statistically independent.)

In practice, the cyclic component is often included with the trend, which is known as the trend-cycle component, since the cyclic component cannot be predicted in the same way that a seasonal component might be. Thus, time series data may be considered to be made up of three components: the trend (or trend-cycle) component, the seasonal component (or components), and noise or random error. Indeed, you might consider that trend, seasonal, and noise components are three separate time series that all contribute to the time series data. Although we still want to think of each component as a sequence of values, it is customary to drop the parentheses in the notation. Thus, the time series will be denoted simply by xnxn rather than (xn)(xn), and the three components will be denoted in a similar way.

Now the ways in which the components are combined in a time series analysis varies. There are additive decompositions, multiplicative decompositions, and hybrid decompositions.

In a simple additive decomposition, the time series is decomposed into the sum of its three components, tntn for the trend, snsn for the seasonal variation, and εnεn for the error term, as shown:

xn=tn+sn+εnxn=tn+sn+εn

In a strictly multiplicative decomposition, the time series is decomposed into the product of its components (which will look quite different from their additive counterparts, and so capital letters are used to distinguish them).

xn=Tn·Sn·Enxn=Tn·Sn·En

Finally, a hybrid decomposition in which the trend-cycle and seasonal components are multiplied while the error term remains as an additive component is often useful. In this case, we might write:

xn=Tn·Sn+εnxn=Tn·Sn+εn

Multiplicative and hybrid decompositions can be more useful if the variation above and below the trend curve seems to be proportional to the level of the trend itself. In other words, when the trend is higher, the variation around the trend is also larger. We will focus only on additive decomposition in this text. In fact, there is a simple transformation using properties of logarithms that changes a strictly multiplicative decomposition into an additive one as long as all of the components consist of positive values.

log(xn)=log(Tn·Sn·En)=log(Tn)+log(Sn)+log(En)log(xn)=log(Tn·Sn·En)=log(Tn)+log(Sn)+log(En)

Note that regression methods may be used in conjunction with decomposition. This section will cover the fundamentals of decomposition and regression for time series analysis.

The Trend Curve and Trend-Cycle Component

As previously defined, the trend is the direction that the data follows when variations such as seasonal ups and downs and random noise and error are ignored. The trend curve (or trendline) models the trend by a line or curve, representing something like a mean value for the time series in any given time frame. By this definition, the linear regression shown in Example 5.1 would be a trend curve for the S&P 500 Index values. If there is no significant rise or fall in the long-term (that is, only short-term fluctuations or patterns exist), then the trend may be estimated by a constant called the level, which is the mean of all the time series data.

Visualizing a Trend Curve

There is no single agreed-upon approach to finding the trend of a time series. The mechanics of isolating a trend curve or trend-cycle component are covered in detail in Time Series Forecasting Methods. For now, we shall see by example what a trend curve might look like in a dataset.

Figure 5.5 shows the monthly consumption of coal for generating electricity in the United States from 2016 through 2022, as provided in the dataset MonthlyCoalConsumption.xlsx (source: https://www.statista.com/statistics/1170053/monthly-coal-energy-consumption-in-the-us/. Note: The data was cleaned in the following ways: Month converted to a standard date format from string format; Value converted to numerical data from string format.)

A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from January 2016 to May 2022. The Y axis ranges from 0 to 80,000. A jagged blue line shows ups and downs with a generally downward trend over time.
Figure 5.5 Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022, in 1000s of Tons (data source: https://www.statista.com/statistics/1170053/monthly-coal-energy-consumption-in-the-us/)

Although there are numerous ups and downs, there is clearly an overall trajectory of the data. If you were given the task to draw a curve that best captures the underlying trend, you might draw something like the curve shown in Figure 5.6. (This is what we refer to as a trendline or trend curve.)

A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from January 2016 to May 2022. The Y axis ranges from 0 to 80,000. The blue line represents the actual coal consumption, which fluctuates seasonally with peaks in winter and valleys in summer. The orange line represents the trend of coal consumption over time, showing a general downward trend from around 60,000 tons per month in 2016 to around 40,000 tons per month in 2022.
Figure 5.6 Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022, in 1000s of Tons with Trend Curve Shown (data source: https://www.statista.com/statistics/1170053/monthly-coal-energy-consumption-in-the-us/)

Trendline in Excel

Excel has some built-in features that make it easy to include trend curves directly onto the graph of a time series. First, make a line graph of the data in the same way as shown in Introduction to Time Series Analysis. (Note: If “Recommended Charts” does not display the type of chart that you want, you can click on the “All Charts” tab and choose another. For Figure 5.7, we chose the line chart rather than the default column chart.) Next, you can click on “Add Chart Element” in the “Chart Design” area of Excel. One of the options is “Trendline” (see Figure 5.7). To get a trendline like the one in Figure 5.6, select “More Options” and adjust the period of the trendline to 12. After adding the trendline, you can change its color and design if you wish.

A screenshot of an Excel worksheet showing how to add a trend line to a chart. The Chart Elements popup menu is highlighted and Trendline is highlighted in a red rectangle in that menu. In the submenu, More Options is highlighted in a red rectangle.
Figure 5.7 Adding a Trendline to a Line Chart in Excel (Used with permission from Microsoft)

Seasonal Variations

We’ve seen that seasonality, or seasonal variations, are predictable changes in the data occurring at regular intervals. The smallest interval of time at which the pattern seems to repeat (at least approximately so) is called its period.

For example, suppose you live in a temperate climate in the northern hemisphere and you record the temperature every day for a period of multiple years. You will find that temperatures are warmer in the summer months and colder in the winter months. There is a natural and predicatable pattern of rise and fall on a time scale that repeats every year. This is a classic example of seasonal variation with period 1 year.

However, the period of seasonal variation may be much longer or much shorter than a year. Suppose that you record temperatures near your home every hour for a week or so. Then you will notice a clear repeating cycle of rising and falling temperatures each day—warmer in the daytime and colder each night. This daily cycle of rising and falling temperatures is also a seasonal variation in the time series of period 24 hours. Indeed, any given time series may have multiple seasonal variation patterns at different periods.

Example 5.2

Problem

Walt’s Water Adventures is a local company that rents kayaks and canoes for river excursions. Due to mild winters in the area, WWA is able to rent these items year-round. Quarterly revenues from rentals were recorded for three consecutive years and also shown in Table 5.2. Is there a clear seasonal variation in the dataset, and if so, what is the period of the seasonal variation? When do the high points and low points generally occur within each seasonal pattern?

Quarter Revenues ($)
2021 Spring 1799
2021 Summer 3304
2021 Fall 1279
2021 Winter 999
2022 Spring 2284
2022 Summer 3459
2022 Fall 1964
2022 Winter 1594
2023 Spring 2704
2023 Summer 3849
2023 Fall 2131
2023 Winter 1939
Table 5.2 Quarterly Revenues for Kayak and Canoe Rentals

Noise and Random Variation

All data is noisy. There are often complex factors that affect data in ways that are difficult or impossible to predict. Even if all factors could be accounted for, there will inevitably be some error in the data due to mistakes, imprecise measurements, or even how the values are rounded.

Noise shows up in time series data as small random ups and downs without an identifiable pattern. When visualizing the time series, noise is the component of the data that makes the line graph look jagged. In the absence of noise or random variation, the plot of a time series would look fairly smooth.

In a time series, once the trend, cyclic variation, and seasonal variation are all quantified, whatever is “left over” in the data will generally be noise or random fluctuations. In other words, once the trend (or trend-cycle component) tntn and seasonal variation snsn are accounted for, then the noise component is just the remaining component:

εn=xntnsnεn=xntnsn

In the real world, we rarely have access to the actual tntn or snsn series directly. Instead, almost all time series forecast methods produce a model (x^n)(x^n) that tries to capture the trend-cycle curve and seasonal variation. The residuals, which are the time series of differences between the given time series (xn)(xn) and the model, provide a measure of the error of the model with respect to the original series. If the model is good, in the sense that it captures as much of the predictable aspects of the time series as possible, then the residuals should give an estimate of the noise component of the time series. Therefore, it is customary to use the same notation for residuals as for the noise term. Thus,

εn=xnx^nεn=xnx^n

The residuals should have the characteristics of a truly random time series, known as white noise, which has the following key characteristics:

  1. Constant zero mean. The mean should be close to zero anywhere in the series.
  2. Constant variance. The variance should be approximately the same anywhere in the series. (Note: Variance is defined in Measures of Variation.)
  3. Uncorrelated. The sequence of values at one point in the series should have a correlation close to zero when compared to any sequence of values of the same size anywhere else in the series. (Note: Correlation is covered in Correlation and Linear Regression Analysis.)

In the definitions given, the phrase anywhere in the series refers to a sample of consecutive terms taken from the series (xk,xk+1,xk+2,xk+3,,xk+r)(xk,xk+1,xk+2,xk+3,,xk+r), which is also called a window. How many terms are included in a window? That depends on the time series, but just keep in mind that the window should be large enough to accurately reflect the mean and variance of nearby points while not being so large that it fails to detect actual ups and downs present in the data. Statistical software can be used to check whether a time series has the characteristics of white noise, and we will discuss the topic further in the next section. For now, check out Figure 5.9, which displays a time series of white noise.

A time series graph of white noise. The X axis ranges from 0 to 250 and the Y axis ranges from -3 to 3. The mean is constantly near zero. Variance is pretty constant everywhere in the series, close to a value of 1. There are no observable patterns that repeat.
Figure 5.9 White Noise. White noise is a type of random noise. The mean is constantly near zero. Variance is pretty constant everywhere in the series, close to a value of 1. There are no observable patterns that repeat.
Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.