Learning Outcomes
By the end of this section, you should be able to:
- 5.2.1 Define the trend curve of a time series.
- 5.2.2 Define and identify seasonal and cyclic variations in time series data.
- 5.2.3 Discuss how noise and error affect a time series.
Time series data may be decomposed, or broken up into several key components. There may be an overall direction (up or down) that the data seems to be following, also known as the trend. Cycles and seasonal patterns may be present. Finally, noise or random variation can often be separated from the rest of the data. Time series analysis begins with identifying and extracting the components that contribute to the structure of the data.
Summarizing, the four main components of a time series are defined as follows:
- Trend. The long-term direction of time series data in the absence of any other variation.
- Cyclic component. Large variations in the data that recur over longer time periods than seasonal fluctuations but not with a fixed frequency.
- Seasonal component (often referred to as seasonality). Variation in the data that occurs at fixed time periods, such as every year in the same months or every week on the same days.
- Noise/Random Error. Random, unpredictable variations that occur in the time series that cannot be attributed to any of the other components previously listed. (Note, the term “random” is not the same as statistically independent.)
In practice, the cyclic component is often included with the trend, which is known as the trend-cycle component, since the cyclic component cannot be predicted in the same way that a seasonal component might be. Thus, time series data may be considered to be made up of three components: the trend (or trend-cycle) component, the seasonal component (or components), and noise or random error. Indeed, you might consider that trend, seasonal, and noise components are three separate time series that all contribute to the time series data. Although we still want to think of each component as a sequence of values, it is customary to drop the parentheses in the notation. Thus, the time series will be denoted simply by rather than , and the three components will be denoted in a similar way.
Now the ways in which the components are combined in a time series analysis varies. There are additive decompositions, multiplicative decompositions, and hybrid decompositions.
In a simple additive decomposition, the time series is decomposed into the sum of its three components, for the trend, for the seasonal variation, and for the error term, as shown:
In a strictly multiplicative decomposition, the time series is decomposed into the product of its components (which will look quite different from their additive counterparts, and so capital letters are used to distinguish them).
Finally, a hybrid decomposition in which the trend-cycle and seasonal components are multiplied while the error term remains as an additive component is often useful. In this case, we might write:
Multiplicative and hybrid decompositions can be more useful if the variation above and below the trend curve seems to be proportional to the level of the trend itself. In other words, when the trend is higher, the variation around the trend is also larger. We will focus only on additive decomposition in this text. In fact, there is a simple transformation using properties of logarithms that changes a strictly multiplicative decomposition into an additive one as long as all of the components consist of positive values.
Note that regression methods may be used in conjunction with decomposition. This section will cover the fundamentals of decomposition and regression for time series analysis.
The Trend Curve and Trend-Cycle Component
As previously defined, the trend is the direction that the data follows when variations such as seasonal ups and downs and random noise and error are ignored. The trend curve (or trendline) models the trend by a line or curve, representing something like a mean value for the time series in any given time frame. By this definition, the linear regression shown in Example 5.1 would be a trend curve for the S&P 500 Index values. If there is no significant rise or fall in the long-term (that is, only short-term fluctuations or patterns exist), then the trend may be estimated by a constant called the level, which is the mean of all the time series data.
Visualizing a Trend Curve
There is no single agreed-upon approach to finding the trend of a time series. The mechanics of isolating a trend curve or trend-cycle component are covered in detail in Time Series Forecasting Methods. For now, we shall see by example what a trend curve might look like in a dataset.
Figure 5.5 shows the monthly consumption of coal for generating electricity in the United States from 2016 through 2022, as provided in the dataset MonthlyCoalConsumption.xlsx (source: https://www.statista.com/statistics/1170053/monthly-coal-energy-consumption-in-the-us/. Note: The data was cleaned in the following ways: Month converted to a standard date format from string format; Value converted to numerical data from string format.)
Although there are numerous ups and downs, there is clearly an overall trajectory of the data. If you were given the task to draw a curve that best captures the underlying trend, you might draw something like the curve shown in Figure 5.6. (This is what we refer to as a trendline or trend curve.)
Trendline in Excel
Excel has some built-in features that make it easy to include trend curves directly onto the graph of a time series. First, make a line graph of the data in the same way as shown in Introduction to Time Series Analysis. (Note: If “Recommended Charts” does not display the type of chart that you want, you can click on the “All Charts” tab and choose another. For Figure 5.7, we chose the line chart rather than the default column chart.) Next, you can click on “Add Chart Element” in the “Chart Design” area of Excel. One of the options is “Trendline” (see Figure 5.7). To get a trendline like the one in Figure 5.6, select “More Options” and adjust the period of the trendline to 12. After adding the trendline, you can change its color and design if you wish.
Seasonal Variations
We’ve seen that seasonality, or seasonal variations, are predictable changes in the data occurring at regular intervals. The smallest interval of time at which the pattern seems to repeat (at least approximately so) is called its period.
For example, suppose you live in a temperate climate in the northern hemisphere and you record the temperature every day for a period of multiple years. You will find that temperatures are warmer in the summer months and colder in the winter months. There is a natural and predicatable pattern of rise and fall on a time scale that repeats every year. This is a classic example of seasonal variation with period 1 year.
However, the period of seasonal variation may be much longer or much shorter than a year. Suppose that you record temperatures near your home every hour for a week or so. Then you will notice a clear repeating cycle of rising and falling temperatures each day—warmer in the daytime and colder each night. This daily cycle of rising and falling temperatures is also a seasonal variation in the time series of period 24 hours. Indeed, any given time series may have multiple seasonal variation patterns at different periods.
Example 5.2
Problem
Walt’s Water Adventures is a local company that rents kayaks and canoes for river excursions. Due to mild winters in the area, WWA is able to rent these items year-round. Quarterly revenues from rentals were recorded for three consecutive years and also shown in Table 5.2. Is there a clear seasonal variation in the dataset, and if so, what is the period of the seasonal variation? When do the high points and low points generally occur within each seasonal pattern?
Quarter | Revenues ($) |
---|---|
2021 Spring | 1799 |
2021 Summer | 3304 |
2021 Fall | 1279 |
2021 Winter | 999 |
2022 Spring | 2284 |
2022 Summer | 3459 |
2022 Fall | 1964 |
2022 Winter | 1594 |
2023 Spring | 2704 |
2023 Summer | 3849 |
2023 Fall | 2131 |
2023 Winter | 1939 |
Solution
The period is 1 year, or 4 quarters, with high values occurring in summer of each year and low values occurring in winter of each year as shown in Figure 5.8.
Noise and Random Variation
All data is noisy. There are often complex factors that affect data in ways that are difficult or impossible to predict. Even if all factors could be accounted for, there will inevitably be some error in the data due to mistakes, imprecise measurements, or even how the values are rounded.
Noise shows up in time series data as small random ups and downs without an identifiable pattern. When visualizing the time series, noise is the component of the data that makes the line graph look jagged. In the absence of noise or random variation, the plot of a time series would look fairly smooth.
In a time series, once the trend, cyclic variation, and seasonal variation are all quantified, whatever is “left over” in the data will generally be noise or random fluctuations. In other words, once the trend (or trend-cycle component) and seasonal variation are accounted for, then the noise component is just the remaining component:
In the real world, we rarely have access to the actual or series directly. Instead, almost all time series forecast methods produce a model that tries to capture the trend-cycle curve and seasonal variation. The residuals, which are the time series of differences between the given time series and the model, provide a measure of the error of the model with respect to the original series. If the model is good, in the sense that it captures as much of the predictable aspects of the time series as possible, then the residuals should give an estimate of the noise component of the time series. Therefore, it is customary to use the same notation for residuals as for the noise term. Thus,
The residuals should have the characteristics of a truly random time series, known as white noise, which has the following key characteristics:
- Constant zero mean. The mean should be close to zero anywhere in the series.
- Constant variance. The variance should be approximately the same anywhere in the series. (Note: Variance is defined in Measures of Variation.)
- Uncorrelated. The sequence of values at one point in the series should have a correlation close to zero when compared to any sequence of values of the same size anywhere else in the series. (Note: Correlation is covered in Correlation and Linear Regression Analysis.)
In the definitions given, the phrase anywhere in the series refers to a sample of consecutive terms taken from the series , which is also called a window. How many terms are included in a window? That depends on the time series, but just keep in mind that the window should be large enough to accurately reflect the mean and variance of nearby points while not being so large that it fails to detect actual ups and downs present in the data. Statistical software can be used to check whether a time series has the characteristics of white noise, and we will discuss the topic further in the next section. For now, check out Figure 5.9, which displays a time series of white noise.