Dr. Shaun V. Ault; Dr. Soohyun Nam Liao; Larry Musolino

Learning Outcomes

By the end of this section, you should be able to:

5.3.1 Analyze a time series by decomposing it into its components, including trend, seasonal and cyclic variation, and residual noise.
5.3.2 Compute various kinds of moving averages, weighted moving averages, and related smoothing techniques.
5.3.3 Describe the ARIMA procedure and how autocorrelation functions can be used to find appropriate parameters.
5.3.4 Use ARIMA to make forecasts, with the aid of Python.

This section focuses on analysis of a time series into its components for the purpose of building a robust model for forecasting. Although much of the analysis can be done by hand, numerous computer software applications have been developed to handle this task. Ultimately, we will see how Python can be used to decompose a time series and create a general-purpose model called the autoregressive integrated moving average model (commonly known as ARIMA) to make forecasts.

Detrending the Data and Smoothing Techniques

Detrending a time series refers to the process of separating the underlying trend component $t_{n}$ from the time series as a whole through one of two complementary operations: isolating the trend and then removing it or smoothing out the trend leaving only the seasonal and error components. Methods that reduce the error term and flatten out seasonal variations result in a trend-cycle component $t_{n}$ that still closely follows the long-term ups and downs in the time series. In this case, the other components of the time series can be found by removing the trend component:

x_{n} - t_{n} = s_{n} + ε_{n}

On the other hand, there are detrending methods that transform the data into a new time series with a trendline closer to a constant line at the level of 0. In other words, after applying such a transformation, the new time series, $z_{n}$ , would consist of only the seasonal and noise components:

z_{n} = s_{n} + ε_{n}

Common methods for isolating or removing the trend or trend-cycle component include the following:

Simple moving average (SMA). The simple moving average (SMA) is found by taking the average (arithmetic mean) of a fixed number of consecutive data points. This has the effect of smoothing out short-term variations in the data and isolating the trend. If the data has seasonal variation, then an SMA of the same length can effectively remove the seasonal component completely—there is more about this in Time Series Forecasting Methods.
Weighted moving average (WMA). There are several variations on the moving average idea, in which previous terms of the series are given different weights, thus termed weighted moving average (WMA). One of the most common weighted moving averages is the exponential moving average (EMA), in which the weights follow exponential decay, giving more recent data points much more weight than those further away in time. These techniques are more often used in forecasting as discussed later in this section, but they may also be employed as a smoothing method to isolate a trend in the data.
Differencing. The (first-order) difference of a time series is found by differencing, or taking differences of consecutive terms—that is, the n^th term of the difference would be $x_{n + 1} - x_{n}$ . If the original time series had a linear trend, then the difference would display a relatively constant trend with only noise and seasonal variation present. Similarly, the difference of the difference, or second-order difference, can transform a quadratic trend into a constant trend. Higher-order polynomial trends can be identified using higher-order differences. Thus, we can use differencing to check for the overall trend of a certain type (linear, quadratic, etc.) while at the same time removing the original trend from the time series.
Regression. As mentioned in Example 5.1, linear regression may be applied to a time series that seems to display a uniform upward or downward trend. Using differencing to identify whether there is a trend of some particular order, then linear, quadratic, or higher-order polynomial regression could be used to find that trend.
Seasonal-Trend decomposition using LOESS (STL). Seasonal-Trend decomposition using LOESS (STL) is a powerful tool that decomposes a time series into the trend-cycle, seasonal, and noise components. Later in this section we will find out how to perform STL using Python. (Note: LOESS is a regression technique the fits a low-order polynomial—typically linear or quadratic—to small portions of the data at a time. A full understanding of LOESS is not required for using STL.)

There are, of course, more complex methods for analyzing and isolating the trends of time series, but in this text we will stick to the basics and make use of technology when appropriate.

Simple Moving Averages (SMA)

Recall that we use the term window to refer to consecutive terms of a time series. In what follows, assume that the size of the window is a constant, $T$ . Given a time series with values $(x_{n})$ , that is, a sequence ${(x}_{1}, x_{2}, x_{3}, x_{4}, \dots, x_{N})$ , the simple moving average with window size $T$ is defined as a new series $(y_{n})$ with values given by the formula

y_{n} = \frac{x_{n} + x_{n - 1} + x_{n - 2} + \dots + x_{n - T + 1}}{T} = \frac{1}{T} \sum_{k = 1}^{T} x_{n - T + k}

This formula is not as difficult as it looks. Just think of it as the average of the last $T$ terms. For example, given the sequence (5, 4, 2, 6, 2, 3, 1, 7), the SMA of length 4 would be found by averaging 4 consecutive terms at a time. Note that the first term of the SMA is $y_{4}$ because at least 4 terms are needed in the average.

Term $k = 4$ : $(5, 4, 2, 6, 2, 3, 1, 7) \to y_{4} = \frac{5 + 4 + 2 + 6}{4} = \frac{17}{4} = 4.25$

Term $k = 5$ : $(5, 4, 2, 6, 2, 3, 1, 7) \to y_{5} = \frac{4 + 2 + 6 + 2}{4} = \frac{14}{4} = 3.5$

Term $k = 6$ : $(5, 4, 2, 6, 2, 3, 1, 7) \to y_{6} = \frac{2 + 6 + 2 + 3}{4} = \frac{13}{4} = 3.25$

Term $k = 7$ : $(5, 4, 2, 6, 2, 3, 1, 7) \to y_{7} = \frac{6 + 2 + 3 + 1}{4} = \frac{12}{4} = 3$

Term $k = 8$ : $(5, 4, 2, 6, 2, 3, 1, 7) \to y_{8} = \frac{2 + 3 + 1 + 7}{4} = \frac{13}{4} = 3.25$

Thus, the SMA of length 4 is the sequence (4.25, 3.5, 3.25, 3, 3.25). Note how the values of the SMA do not jump around as much as the values of the original sequence.

The number of terms to use in a simple moving average depends on the dataset and desired outcome. If the intent is to reduce noise, then a small number of terms (say, 3 to 5) may be sufficient. Sometimes inconsistencies in data collection can be mitigated by a judicious use of SMA. For example, suppose data is collected on the number of walk-in customers at the department of motor vehicles, which only operates Monday through Friday. No data is collected on the weekends, while the number of walk-ins on Mondays may be disproportionately high because people have had to wait until Monday to use their services. Here, an SMA of 7 days would smooth out the data, spreading the weekly walk-in data more evenly over all the days of the week.

Example 5.3

Problem

Find the simple moving average of window size 3 for the data from Table 5.1 and compare with the original line graph.

Solution

The SMA will begin at index $k = 3$ .

y_{3} = \frac{1,848.36 + 2,058.9 + 2,043.94}{3} = 1,983.73

y_{4} = \frac{2,058.9 + 2,043.94 + 2,238.83}{3} = 2,113.89

y_{5} = \frac{2,043.94 + 2,238.83 + 2,673.61}{3} = 2,318.79

y_{6} = \frac{2,238.83 + 2,673.61 + 2,506.85}{3} = 2,473.1

y_{7} = \frac{2,673.61 + 2,506.85 + 3,230.78}{3} = 2,803.75

y_{8} = \frac{2,506.85 + 3,230.78 + 3,756.07}{3} = 3,164.57

y_{9} = \frac{3,230.78 + 3,756.07 + 4,766.18}{3} = 3,917.68

y_{10} = \frac{3,756.07 + 4,766.18 + 3,839.5}{3} = 4,120.58

y_{11} = \frac{4,766.18 + 3,839.5 + 4,769.83}{3} = 4,458.5

The SMA is graphed together with the original time series in Figure 5.10.

A line chart titled S&P Index at the End of Year 2013-2023. The X axis has years from 2013 to 2023 and the Y axis ranges from 0 to 6,000. A jagged blue line representing the SP 500 shows a general upward trend over the past decade. An orange line representing the simple moving average of length 3 starts at 2015 and rises along with the SP 500 line. Both lines show an upward trend, with the S&P 500 experiencing greater volatility. The S&P 500 starts at around 1,500 in 2013, rises to around 2,500 in 2016, then fluctuates between 2,000 and 3,000 until 2021, when it rises sharply to 5,000 in 2023. The SMA (3) line shows a smoother upward trend, starting at around 1,500 in 2013 and reaching around 4,500 in 2023.

Figure 5.10 Line Chart of the S&P 500 Index Together with the Simple Moving Average of Length 3 (data source: adapted from S&P 500 [SPX] Historical Data)

Simple moving averages provide an easy way to identify a trend-cycle curve in the time series. In effect, the SMA can serve as an estimate of the $t_{n}$ component of the time series. In practice, when using SMA to isolate the trend-cycle component, the terms of the SMA are centered, which means that when computing the terms of the SMA, $y_{k}$ , the $x_{k}$ term shows up right in the middle of the window, with the same number of terms to the left and to the right.

For example, the SMA found in Example 5.3 may be shifted so that $y_{k} = \frac{x_{k - 1} + x_{k} + x_{k + 1}}{3}$ . This has the effect of shifting the index of the SMA terms back 1 so that the SMA begins at index $k = 2$ .

y_{2} = \frac{x_{1} + x_{2} + x_{3}}{3} = \frac{1,848.36 + 2,058.9 + 2,043.94}{3} = 1,983.73

This procedure works well when the window size is odd, but when it is even, a common practice is to compute two SMAs in which $x_{k}$ is very nearly in the middle of the sequence and then average their results. For example, using the same data as in Example 5.3, if we wanted to compute a centered SMA of length 4, then the first term would be $y_{3}$ , and we would find two averages.

\frac{x_{1} + x_{2} + x_{3} + x_{4}}{4} = \frac{1,848.36 + 2,058.9 + 2,043.94 + 2,238.83}{4} = 2,047.51

\frac{x_{2} + x_{3} + x_{4} + x_{5}}{4} = \frac{2,058.9 + 2,043.94 + 2,238.83 + 2,673.61}{4} = 2,253.82

y_{3} = \frac{2,047.51 + 2,253.82}{2} = 2,150.67

The average of the two averages is equivalent to the following more efficient formula. First consider the previous example.

y_{3} = \frac{1}{2} (\frac{x_{1} + x_{2} + x_{3} + x_{4}}{4} + \frac{x_{2} + x_{3} + x_{4} + x_{5}}{4}) = \frac{x_{1}}{8} + \frac{x_{2}}{4} + \frac{x_{3}}{4} + \frac{x_{4}}{4} + \frac{x_{5}}{8}

This suggests a general pattern for even window sizes. In summary, we have two closely related formulas for centered SMA, depending on whether $T$ is even or odd.

y_{k} = \frac{1}{T} (x_{k - (T - 1) / 2} + \dots + x_{k + (T - 1) / 2}), for T odd

y_{k} = \frac{x_{k - T / 2}}{2 T} + \frac{1}{T - 2} (x_{k - T / 2 + 1} + \dots + x_{k + T / 2 - 1}) + \frac{x_{k + T / 2}}{2 T}, for T even

Of course, there are functions in statistical software that can work out the centered SMA automatically for you.

Differencing

In mathematics, the first-order difference of a sequence $(x_{n}) = {(x}_{1}, x_{2}, x_{3}, x_{4}, \dots, x_{N})$ is another sequence, denoted $Δ (x_{k})$ , consisting of the differences of consecutive terms. That is,

\begin{array}{rcl} Δ (x_{n}) & = & (x_{n + 1} - x_{n}) \\ = & (x_{2} - x_{1}, x_{3} - x_{2}, x_{4} - x_{3}, \dots, x_{N} - x_{N - 1}) \end{array}

Note that there is one fewer term in the difference $Δ (x_{n})$ compared to the original time series $(x_{n})$ . Differencing has the effect of leveling out a uniform upward or downward trend without affecting other factors such as seasonal variation and noise.

Moreover, the average (arithmetic mean) of the terms of difference series measures of the rate of change of the original time series, much as the slope of the linear regression does. (Indeed, if you have seen some calculus, you might notice that there is a strong connection between differencing and computing the derivative of a continuous function.)

Example 5.4

Problem

Find the first-order difference time series for the data from Table 5.1 and compare with the original line graph. Find the average of the difference series and compare the value to the slope of the linear regression found in Example 5.1.

Solution

Since the original time series from Table 5.1 has $N = 10$ terms, the difference will have only 9 terms, as shown in Table 5.3.

Term n	S&P Index Term $x_{n}$	$n^{th}$ Term of $Δ (x_{n})$
1	1848.36	N/A
2	2058.9	$2058.9 - 1848.36 = 210.54$
3	2043.94	$2043.94 - 2058.9 = - 14.96$
4	2238.83	$2238.83 - 2043.94 = 194.89$
5	2673.61	$2673.61 - 2238.84 = 434.78$
6	2506.85	$2506.85 - 2673.61 = - 166.76$
7	3230.78	$3230.78 - 2506.85 = 723.93$
8	3756.07	$3756.07 - 3230.78 = 525.29$
9	4766.18	$4766.18 - 3756.07 = 1010.11$
10	3839.5	$3839.5 - 4766.18 = - 926.68$
11	4769.83	$4769.83 - 3839.5 = 930.33$

Table 5.3 First-Order Difference Time Series for the Data in Table 5.1 (source: adapted from https://www.nasdaq.com/market-activity/index/spx/historical)

Figure 5.11 shows the graph of the original data together with its difference series.

A line chart titled S&P Index at the End of Year 2013-2023. The X axis has years from 2013 to 2023 and the Y axis ranges from -2,000 to 6,000. The S&P 500 index starts at around 1,900 in 2013, rises to around 2,200 in 2016, then fluctuates between 2,000 and 4,000 until 2020; it rises to 4,800 in 2021, falls back down to 4,000 in 2022, and rises again to 4,800 in 2023. The first-order difference shows the change in the S&P 500 index from one year to the next, with positive values indicating an increase and negative values indicating a decrease. The first-order difference fluctuates between -2,000 and 1,000, with a general trend of increasing volatility over time.

Figure 5.11 Comparison of S&P Index Values and Their Difference Series (data source: adapted from S&P 500 [SPX] Historical Data)

The average value of $Δ (x_{n})$ is:

\frac{1}{10} (210.54 + (- 14.96) + 194.89 + 434.78 + (- 166.76) + 723.93 + 525.29 + 1,010.11 + (- 926.68) + 930.33) = 292.15

Comparing with the slope of the linear regression, which is about 304 (recall Example 5.1), we find the two values are relatively close to each other.

Differencing can be used to check whether a time series trend curve is essentially linear, quadratic, or polynomial of higher order. First, we need to recall a definition from algebra that is typically applied to polynomial functions but can also be applied to sequences. If a sequence $(x_{n})$ is given by a polynomial formula of the form

x_{n} = f (n) = a_{r} n^{r} + a_{r - 1} n^{r - 1} + \dots + a_{1} n + a_{0}

where $r$ is a whole number, each of the coefficients $a_{k}$ (for $k = 1, 2, 3, \dots, n$ ) is a constant, and the leading term is nonzero ( $a_{r} \neq 0$ ), then the sequence has order or degree $r$ . (Order or degree refers to the number of terms or lags used in a model to describe the time series.)

For example, any sequence of the form $x_{n} = a n + b$ has order 1 (and may be called linear). A sequence of the form $x_{n} = a n^{2} + b n + c$ has order 2 (and is called quadratic). Quadratic data fits well to a parabola, which would indicate a curving trend in the data.

If a sequence $(x_{n})$ has order $r > 0$ , then the difference sequence $Δ (x_{n})$ has order $r - 1$ . If the original sequence already is of order 0, which means the sequence is a constant sequence (all terms being the same constant), then the difference sequence simply produces another constant sequence (order 0) in which every term is equal to 0. This important result can be proved mathematically, but we will not present the proof in this text.

As an example, consider the sequence $(x_{n}) = (2, 5, 8, 11, 14, 17, 20)$ , which has order 1, since you can find the values using the formula $x_{n} = 3 n - 1$ . The difference sequence is $Δ (x_{n}) = (3, 3, 3, 3, 3, 3)$ , which is a constant sequence, so it has order 0. Notice that taking the difference one more time yields the constant 0 sequence, $Δ (Δ (x_{n})) = (0, 0, 0, 0, 0)$ , and taking further differences would only reduce the number of terms, but each term would remain at the value 0. (For simplicity, we will assume that there are always enough terms in a sequence so that the difference is also a non-empty sequence.)

The second-order difference, $Δ^{2} (x_{n}) = Δ (Δ (x_{n}))$ , or difference of the difference, would reduce the order of an $r$ -order sequence twice to $r - 2$ if $r \geq 2$ or to the constant zero sequence if $r \leq 1$ . In general, a $d$ -order difference, defined by $Δ^{d} (x_{n}) = Δ (Δ^{d - 1} (x_{n}))$ , reduces the order of an $r$ -order sequence to $r - d$ if $r \geq d$ or to order 0 otherwise. With a little thought, you may realize that we now have a simple procedure to determine the order of a sequence that does not require us to have the formula for $x_{n}$ in hand.

If a sequence $(x_{n})$ has a well-defined order (in other words, there is a polynomial formula that is a good model for the terms of the sequence, even if we do not know what that formula is yet), then the order $r$ is the smallest whole number such that the $r$ -order difference, $Δ^{r + 1} (x_{n})$ , is equal to either a constant 0 sequence or white noise.

Example 5.5

Problem

Determine the order of the sequence (10, 9, 6, 7, 18, 45, 94, 171) using differencing.

Solution

Let $(x_{n})$ stand for the given sequence.

First-order difference: $Δ (x_{n}) = (- 1, - 3, 1, 11, 27, 49, 77)$

Second-order difference: $Δ^{2} (x_{n}) = (- 2, 4, 10, 16, 22, 28)$

Third-order difference: $Δ^{3} (x_{n}) = (6, 6, 6, 6, 6)$

Fourth-order difference: $Δ^{4} (x_{n}) = (0, 0, 0, 0)$

The given sequence has order 3 since the $(3 + 1)$ -order difference resulted in all zeros.

We cannot hope to encounter real-world time series data this is exactly linear, quadratic, or any given degree. Every time series will have some noise in addition to possible seasonal patterns. However, the trend-cycle curve may still be well approximated by a polynomial model of some order.

SMA and Differencing in Excel

Excel can easily do simple moving averages and differencing. The ability to write one equation in a cell and then drag it down so that it works on all the rows of data is a very powerful and convenient feature. By way of example, let’s explore the dataset MonthlyCoalConsumption.xlsx. First, you will need to save the file as an Excel worksheet, as CSV files do not support equations.

Let’s include a centered 13-term window SMA (see Figure 5.12). In cell C8, type “=AVERAGE(B2:B14)” as shown in Figure 5.12. Note: There should be exactly 6 rows above and 6 rows below the center row at which the formula is entered.

A screenshot of an Excel worksheet with two columns of text labeled Month and Value. 13 data points in the Value column (B) are selected and the formula =AVERAGE(B2:B14) is applied in the formula bar.

Figure 5.12 Screenshot of a 13-term Excel Window SMA (Used with permission from Microsoft)

Next, click on the square in the lower right corner of cell C8 and drag it all the way down to cell C173, which is 6 rows before the final data value. Column C now contains the SMA time series for your data, as shown in Figure 5.13.

A screenshot of an Excel worksheet with three columns of text labeled Month, Value, and SMA. Columns A and B have 19 rows of data. Column C, starting in row 8, shows the SMA time series.

Figure 5.13 Screenshot Showing the SMA Time Series in Column C (Used with permission from Microsoft)

You can generate the graph of the time series and the SMA together, which is shown in Figure 5.14. Compare your computed SMA with the trendline created by Excel in Figure 5.7.

A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from January 2016 to May 2022. The Y axis ranges from 0 to 80,000. The blue line represents the actual coal consumption, which fluctuates seasonally with peaks in winter and valleys in summer. The orange line represents the trend of coal consumption over time, showing a general downward trend from around 80,000 tons per month in 2016 to around 40,000 tons per month in 2022.

Figure 5.14 Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022. The SMA of window length 13 has been added to the graph in orange. (data source: https://www.statista.com/statistics/1170053/monthly-coal-energy-consumption-in-the-us/)

Let’s now work out the first-order difference. The idea is very similar. Put the formula for the difference, which is nothing more than “=B3-B2,” into cell D3, as in Figure 5.15.

A screenshot of an Excel worksheet with four columns labeled Month, Value, SMA, and Difference. Cells B2 and B3 are selected and the Difference formula =B3-B2 is applied in the formula bar.

Figure 5.15 Screenshot of Excel Showing Formula for Difference in Column D (Used with permission from Microsoft)

Then drag the formula down to the end of the column. The result is shown in Figure 5.16.

A screenshot of an Excel worksheet with four columns labeled Month, Value, SMA, and Difference. Row D shows that the Difference formula =B3-B2 has been applied to the whole column.

Figure 5.16 Screenshot of Excel Showing Difference in Column D (Used with permission from Microsoft)

If you need to compute the second-order differences, just do the same procedure to find the difference of column D. Higher-order differences can easily be computed in the analogous way.

SMA and Differencing in Python

Computing an SMA in Python takes only a single line of code. Here is how it works on the Coal Consumption dataset (found in MonthlyCoalConsumption.xlsx), using a window size of 12. (Before, we used a window of 13 because odd-length windows are easier to work with than even-length ones. However, Python easily handles all cases automatically.) The key here is the function rolling() that creates a DataFrame of rolling windows of specified length. Then the mean() command computes the average of each window of data. Computing the first-order difference is just as easy, using the diff() command.

Python Code

    
    # Import libraries
    import pandas as pd # for dataset management
    import matplotlib.pyplot as plt # for visualizations
    from matplotlib.ticker import MaxNLocator, FuncFormatter # for graph formatting and y-axis formatting
    import matplotlib.dates as mdates # for date formatting
    
    # Read the Excel file into a Pandas DataFrame
    df = pd.read_excel('MonthlyCoalConsumption.xlsx')
    
    # Convert 'Month' column to datetime format
    df['Month'] = pd.to_datetime(df['Month'], format='%m/%d/%Y')
    
    # Calculate a simple moving average with a window size of 12
    df['SMA'] = df['Value'].rolling(window=12).mean()
    
    # Calculate the first-order difference
    df['Diff'] = df['Value'].diff()
    
    # Function to format the Y-axis values
    def y_format(value, tick_number):
      return f'{int(round(value)):,}'
    
    # Create the plot
    plt.figure(figsize=(12, 6))
    
    # Plot the time series, SMA, and difference
    plt.plot(df['Month'], df['Value'], marker='', linestyle='-', color='b', label='Value')
    plt.plot(df['Month'], df['SMA'], marker='', linestyle='-', color='r', label='SMA')
    plt.plot(df['Month'], df['Diff'], marker='', linestyle='-', color='g', label='Diff')
    
    # Set the x-axis limits to start from the first date and end at the maximum date in the data
    plt.xlim(pd.to_datetime('01/01/2016', format='%m/%d/%Y'), df['Month'].max())
    
    # Format the x-axis to show dates as MM/DD/YYYY and set the locator for a fixed number of ticks
    ax = plt.gca()
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
    ax.xaxis.set_major_locator(MaxNLocator(nbins=12)) # Set the number of ticks to a reasonable value
    
    # Manually set the rotation of date labels to 45 degrees
    plt.xticks(rotation=45)
    
    # Title and labels
    plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
    plt.xlabel('Month')
    plt.legend()
    
    # Apply the formatter to the Y-axis
    ax.yaxis.set_major_formatter(FuncFormatter(y_format))
    
    plt.tight_layout() # Adjust layout to ensure labels fit well
    plt.show()

The resulting output will look like this:

A screenshot of a line graph Python output labeled A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from July 2016 to January 2022. The Y axis ranges from -20,000 to 60,000. The blue line represents the actual coal consumption, which fluctuates seasonally. The red line represents the 3-month simple moving average (SMA) of coal consumption, which smooths out the seasonal fluctuations. The green line represents the difference between the actual consumption and the 3-month SMA, highlighting the seasonal variations. Coal consumption shows a general downward trend from around 80,000 tons per month in 2016 to around 40,000 tons per month in 2022.

Analyzing Seasonality

Once the time series has been detrended using one of the methods already discussed (or other, more sophisticated methods), what is left over should represent only the seasonality, or seasonal variation, $s_{n}$ , and error term, $ε_{n}$ .

z_{n} = s_{n} + ε_{n}

How could we analyze and model $s_{n}$ ? Suppose there were no error term at all $(ε_{n} = 0)$ and so we are looking at a time series with pure seasonal variation and no other factors. In this case, the component $s_{n}$ would look like a repeating pattern, such as the pattern shown in Figure 5.17. As mentioned in Components of Time Series Analysis, the time interval that it takes before the pattern completes once and begins to repeat is called the period. In a purely seasonal component, we would expect $s_{n + P} = s_{n}$ , where $P$ is the period, for all values of $n$ .

A line chart. The X axis ranges from 1 to 225. The Y axis ranges from -5 to 5. A blue line shows ups and downs representing a pure seasonal variation. The same pattern repeats after 64 time steps.

Figure 5.17 Pure Seasonal Variation. The same pattern repeats after 64 time steps, and so the period is 64.

While it is easy to spot the repeating pattern in a time series that does not have noise, the same cannot be said for the general case. There are several tools available for isolating the seasonal component of a detrended time series.

Identifying the Period

First we need to identify the period of the seasonal data. The simplest method is to use prior knowledge about the time series data, along with a visual inspection. Often, a seasonal pattern is exactly that—a pattern that repeats over the course of one year. Thus, the period would be $P = 1$ year (or 12 months, 52 weeks, 365 days, etc., depending on the time steps of your time series). For example, even without detrending the data, it was easy to identify a period of 4 quarters (1 year) in the data in Example 5.2.

If repetition of a seasonal pattern is not apparent from the data, then it may be discovered using standard statistical measures such as autocorrelation. Autocorrelation is the measure of correlation between a time series and the same time series shifted by some number of time steps. (Correlation and Linear Regression Analysis discussed correlation between two sets of data. Autocorrelation is the same idea applied to the same dataset with a shifted copy of itself.) The size of the shift is called the lag.

The autocorrelation function (ACF) measures the correlation at each lag starting with 0 or 1. Note, lag 0 always produces an autocorrelation of 1 since the time series is being compared to an exact copy of itself. Thus, we are only interested in the correlation coefficient at positive lags. However, small lags generally show high correlation with the original time series as well, simply because data measured within shorter time periods are generally similar to each other. So, even if the ACF shows relatively large correlation at lags of 1, 2, 3, etc., there may not be a true seasonal component present. The positive lag corresponding to the first significant peak, or autocorrelation value that is much larger than the previous values, may indicate a seasonal component with that period.

Later, we will see how to compute ACF of a time series using Python, but first let’s explore further topics regarding seasonal components.

Seasonally Adjusted Data

Once you have determined the period of the seasonal component, you can then extract $s_{n}$ from the time series. As usual, there are a number of different ways to proceed.

Let’s assume that you have detrended the time series so that what remains consists of a seasonal component and noise term:

z_{n} = x_{n} - t_{n} = s_{n} + ε_{n}

Moreover, suppose that the period of $s_{n}$ is $P$ . Of course, with the noise term still present in the time series, we cannot expect that $s_{n + P} = s_{n}$ holds, but it should be the case that $s_{n + P} \approx s_{n}$ for every $n$ . In other words, the data should look approximately the same when shifted by $P$ time steps. Our job now is to isolate one complete period of data that models the seasonal variation throughout the time series. In other words, we want to produce an estimate ${\hat{s}}_{n}$ that is exactly periodic ( ${\hat{s}}_{n + P} = {\hat{s}}_{n}$ , for all $n$ ) and models the underlying seasonal component as closely as possible.

For any given index $n < P$ , we expect the data points $(z_{n}, z_{n + P}, z_{n + 2 P}, z_{n + 3 P}, \dots)$ to hover around the same value or, in statistical language, to have rather low variance (much lower than the time series as a whole, that is). So it seems reasonable to use the mean of this slice of data to define the value of ${\hat{s}}_{n}$ . For simplicity, assume that the time series has exactly $M \cdot P$ data points (if not, then the data can be padded in an appropriate way). Then the estimate for the seasonal component would be defined for $1 \leq n < P$ by the formula

{\hat{s}}_{n} = \frac{1}{M} (z_{n} + z_{n + P} + z_{n + 2 P} + z_{n + 3 P} + \dots + z_{n + (M - 1) P})

As usual, the formula looks more intimidating than the concept that it’s based upon. As previously mentioned, this represents nothing more than the average of all the values of the time series that occur every $P$ time steps.

Once ${\hat{s}}_{n}$ is computed in the range $1 \leq n < P$ , extend it by repeating the same values in each period until you reach the last index of the original series $(n = M P)$ . (Note: The noise term may now be estimated as the series of residuals, $ε_{n} \approx s_{n} - {\hat{s}}_{n}$ , which will occupy our interest in Forecast Evaluation Methods.)

Finally, the estimate ${\hat{s}}_{n}$ can then be used in conjunction with the original time series $(x_{n})$ to smooth out the seasonal variation. Note that ${\hat{s}}_{n}$ essentially measures the fluctuations of observed data $x_{n}$ above or below the trend. We expect the mean of ${\hat{s}}_{n}$ to be close to zero in any window of time. Define the seasonally adjusted data as the time series

x_{n} - {\hat{s}}_{n}

A seasonally adjusted time series is closely related to the underlying trend-cycle curve, $t_{n}$ , though generally it contains the noise and other random factors that $t_{n}$ does not generally include.

Example 5.6

Problem

The revenue data from Walt’s Water Adventures (see Table 5.2) seems to display seasonality of period 4. Using the level as the trend, detrend the data, compute the seasonal component ${\hat{s}}_{n}$ , and find the seasonally adjusted time series.

Solution

First, we need to detrend the data. The level is the average of all 12 observations. Using a calculator or software, the level is found to be $L = 2,275.42$ . Subtract $x_{n} - L$ to detrend the data (see Table 5.4). For example, the first term of the detrended data would be $z_{1} = x_{1} - L = 1,799 - 2,275.42 = - 476.42$ .

There are a total of 12 observations, so there will be $M = 12 / 4 = 3$ periods of size $P = 4$ . Using the values of $z_{n}$ in the formula for ${\hat{s}}_{n}$ , we find:

\begin{array}{rcl} {\hat{s}}_{1} & = & \frac{1}{3} (z_{1} + z_{5} + z_{9}) = \frac{1}{3} ((- 476.42) + 8.58 + 428.58) = - 13.08 \\ {\hat{s}}_{2} & = & \frac{1}{3} (z_{2} + z_{6} + z_{10}) = \frac{1}{3} (1,028.58 + 1,183.58 + 1,573.58) = 1,261.92 \\ {\hat{s}}_{3} & = & \frac{1}{3} (z_{3} + z_{7} + z_{11}) = \frac{1}{3} ((- 996.42) + (- 311.42) + (- 144.42)) = - 484.08 \\ {\hat{s}}_{4} & = & \frac{1}{3} (z_{4} + z_{8} + z_{12}) = \frac{1}{3} ((- 1,276.42) + (- 681.42) + (- 336.42)) = - 764.75 \end{array}

Then repeat this block of 4 values as needed to obtain the entire seasonal component. Finally, the seasonally adjusted time series is the difference between the observed data and seasonal component (see Table 5.4).

k	Quarter	Revenues	Detrended $z_{k}$	Seas. Comp. ${\hat{s}}_{k}$	Seasonally Adj. $x_{n} - {\hat{s}}_{n}$
1	2021 Spring	1799	−476.42	−13.08	1812.08
2	2021 Summer	3304	1028.58	1261.92	2042.08
3	2021 Fall	1279	−996.42	−484.08	1763.08
4	2021 Winter	999	−1276.42	−764.75	1763.75
5	2022 Spring	2284	8.58	−13.08	2297.08
6	2022 Summer	3459	1183.58	1261.92	2197.08
7	2022 Fall	1964	−311.42	−484.08	2448.08
8	2022 Winter	1594	−681.42	−764.75	2358.75
9	2023 Spring	2704	428.58	−13.08	2717.08
10	2023 Summer	3849	1573.58	1261.92	2587.08
11	2023 Fall	2131	−144.42	−484.08	2615.08
12	2023 Winter	1939	−336.42	−764.75	2703.75

Table 5.4 Quarterly Revenues for Kayak and Canoe Rentals (all columns in $). Columns for detrended data, seasonal component estimate, and seasonally adjusted data are included.

Figure 5.18 shows the data along with the seasonal component and seasonally adjusted data.

A line chart titled Quarterly Revenues for Kayak and Canoe Rentals. The X axis has dates from Spring 2021 to Winter 2023. The Y axis ranges from -$1,000 to $5,000. A blue revenues line shows ups and downs from $1,000 to $4,000 with peaks in the summers. A green line with the same shape as the blue line represents the seasonal component, from -$1,000 to a $1,000. An orange seasonal adjustment line is relatively flat from $2,000 to $3,000.

Figure 5.18 Quarterly Revenues for Kayak and Canoe Rentals. Seasonal component and seasonally adjusted time series are included on the graph.

Decomposing a Time Series into Components Using Python

There is a powerful Python library known as statsmodels that collects many useful statistical functions together. We will use the STL module to decompose the time series MonthlyCoalConsumption.xlsx into its components. First, we plot an ACF (autocorrelation function) to determine the best value for periodicity.

Python Code

    # Import libraries
    from statsmodels.graphics.tsaplots import plot_acf
    
    # Read the Excel file into a Pandas Dataframe
    df = pd.read_excel('MonthlyCoalConsumption.xlsx')
    
    # Plot the ACF
    plot_acf(df['Value'], lags=20, title='ACF');

The resulting output will look like this:

ACF plot with 21 lags. The y-axis ranges from -1 to 1 and the x-axis from 0 to 20. The plot shows autocorrelation values for each lag, with confidence intervals indicated by shaded areas.

Points that are within the blue shaded area correspond to lags at which the autocorrelation is statistically insignificant. The lags that show the most significance are at 0, 1, 6, and 12. $Lag = 0$ always gives 100% autocorrelation because we are comparing the unshifted series to itself. At $lag = 1$ , a high autocorrelation corresponds to the fact that consecutive data points are relatively close to each other (which is also to be expected in most datasets). Of special note are lags at 6 and 12, corresponding to a possible seasonal period of 6 or 12 in the data. We will use period 12 in our STL decomposition.

Python Code

    # Import libraries
    from statsmodels.tsa.seasonal import STL
    import matplotlib.pyplot as plt ## for data visualization
    
    # Perform seasonal decomposition using STL
    stl = STL(df['Value'], period=12)
    result = stl.fit()
    
    # Plot components and format the y-axis labels
    fig = result.plot()
    for ax in fig.axes:
      ax.yaxis.set_major_formatter(FuncFormatter(y_format))
    
    plt.show()

The resulting output will look like this:

A time series decomposition plot labeled Value with four subplots: original data is on the X axis (0 to 75). On the Y axis are trend (40,000 to 60,000), seasonal component (-10,000 to 10,000), and residuals (-5,000 to 5,000).

The trend, seasonal variation, and residual (noise) components are all graphed in separate plots. To see them all on the same set of axes, we can use matplotlib (noise omitted).

Python Code

    import matplotlib.pyplot as plt ## for data visualization
    
    # Extract the trend and seasonal components
    trend = result.trend
    seasonal = result.seasonal
    
    # Plot the original data, trend, and seasonally adjusted data
    plt.plot(df['Value'], label='Original Data')
    plt.plot(trend, label='Trend')
    plt.plot(seasonal, label='Seasonal')
    plt.legend()
    
    # Apply the formatter to the Y-axis
    plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
    plt.show()

The resulting output will look like this:

A time series plot showing original data, trend, and seasonal components. Y-axis ranges from 0 to 70,000, x-axis from 0 to 80. Trend line is sloping downward and seasonal component fluctuates around zero.

Weighted Moving Averages and Exponential Moving Averages

Simple moving averages were discussed in detail at the beginning of this section. Now we expand the discussion to include more sophisticated weighted moving averages and their uses in smoothing data and forecasting. Recall from our discussion in Time Series Forecasting Methods that weighted moving averages are moving averages in which terms are given weights according to some formula or rule.

Suppose that a time series $(x_{n})$ has no significant seasonal component (or the seasonal component has been removed so that remaining series is seasonally adjusted). Thus, the series has only trend-cycle and noise components:

x_{n} = t_{n} + ε_{n}

As already discussed, a simple moving average (SMA) has the effect of smoothing out the data and providing an easy method for predicting the next term of the sequence. For reference, here is the SMA prediction model, with a window of $T$ terms:

{\hat{x}}_{n + 1} = \frac{x_{n} + x_{n - 1} + x_{n - 2} + \dots + x_{n - T + 1}}{T}

This model may be regarded as a weighted moving average model (WMA) in which the weights of the most recent $T$ terms are all equal to $\frac{1}{T}$ , while all other weights are set to 0. For example, in an SMA model with $T = 3$ , the weights would be $\frac{1}{3}, \frac{1}{3}, \frac{1}{3}, 0, 0, 0, 0, \dots$

{\hat{x}}_{n + 1} = \frac{x_{n} + x_{n - 1} + x_{n - 2}}{3} = \frac{1}{3} x_{n} + \frac{1}{3} x_{n - 1} + \frac{1}{3} x_{n - 2} + 0 x_{n - 3} + 0 x_{n - 4} + \dots

Even the naïve model, ${\hat{x}}_{n + 1} = x_{n}$ , counts as a WMA, with weights 1, 0, 0, …

In principle, a weighted moving average model may be built using any coefficients, giving a general form for any WMA:

{\hat{x}}_{n + 1} = a_{n} x_{n} + a_{n - 1} x_{n - 1} + a_{n - 2} x_{n - 2} + a_{n - 3} x_{n - 3} + \dots

Of course, if we want the model to do a good job in forecasting new values of the time series, then the coefficients must be chosen wisely. Typically, the weights must be normalized, which means that they sum to 1.

a_{n} + a_{n - 1} + a_{n - 2} + a_{n - 3} + \dots = 1

However, there is no hard-and-fast requirement that the sum must be equal to 1.

Rather than giving the same weight to some number of previous terms as the SMA model does, you might choose to give greater weight to more recent terms. For example, one might use powers of $\frac{1}{2}$ as weights.

{\hat{x}}_{n + 1} = \frac{1}{2} x_{n} + \frac{1}{4} x_{n - 1} + \frac{1}{8} x_{n - 2} + \frac{1}{16} x_{n - 3} + \dots

This method allows the most recent term to dominate the sum, and each successive term contributes half as much in turn. (Note: The sequence $\frac{1}{2}$ , $\frac{1}{4}$ , $\frac{1}{8}$ , $\frac{1}{16}, \dots$ would be a normalized sequence if infinitely many terms were included, but even a small finite number of terms, like 5 or so, adds up to a sum that is pretty close to 1.)

This is an example of an exponential moving average, or EMA. The most general formula for the EMA model is shown. (Note that the previous example corresponds to an EMA model with $α = \frac{1}{2}$ ).

{\hat{x}}_{n + 1} = α x_{n} + α (1 - α) x_{n - 1} + α {(1 - α)}^{2} x_{n - 2} + α {(1 - α)}^{3} x_{n - 3} + \dots

Example 5.7

Problem

Use EMA models with $α = \frac{1}{2}$ (equivalently, 0.5) and $α = 0.7$ for the S&P Index data from Table 5.1 to estimate the value of the S&P Index at the end of year 2023.

Solution

First, for $α = \frac{1}{2}$ , the model is:

{\hat{x}}_{n + 1} = \frac{1}{2} x_{n} + \frac{1}{4} x_{n - 1} + \frac{1}{8} x_{n - 2} + \frac{1}{16} x_{n - 3} + \dots

Now plug in the data (remember to use the most recent data point first and work your way backward). Eight terms were used, as the terms beyond that point are rather close to 0. Table 5.2 records the results.

For $α = 0.7$ , use the formula for weights,

$α = 0.7$ , $α (1 - α) = 0.7 (1 - 0.7) = 0.7 (0.3) = 0.21$ , $α {(1 - α)}^{2} = 0.7 {(0.3)}^{2} = 0.063$ , etc. This time, the coefficients decrease quicker than in the case $α = \frac{1}{2}$ , so only 5 terms are significant. Table 5.5 shows the results.

Year	S&P Index at Year-End	Exponentially Weighted $α = \frac{1}{2}$	Exponentially Weighted $α = 0.7$
2013	1848.36	$\approx 0$	$\approx 0$
2014	2058.9	$\approx 0$	$\approx 0$
2015	2043.94	$\frac{1}{512} \cdot 2043.94 = 3.99$	$\approx 0$
2016	2238.83	$\frac{1}{256} \cdot 2238.83 = 8.75$	$\approx 0$
2017	2673.61	$\frac{1}{128} \cdot 2673.61 = 20.89$	$\approx 0$
2018	2506.85	$\frac{1}{64} \cdot 2506.85 = 39.17$	$0.00567 \cdot 2506.9 = 4.26$
2019	3230.78	$\frac{1}{32} \cdot 3230.78 = 100.96$	$0.00567 \cdot 3230.78 = 18.32$
2020	3756.07	$\frac{1}{16} \cdot 3756.07 = 234.75$	$0.0189 \cdot 3756.07 = 70.99$
2021	4766.18	$\frac{1}{8} \cdot 4766.18 = 595.77$	$0.063 \cdot 4766.18 = 300.27$
2022	3839.5	$\frac{1}{4} \cdot 3839.5 = 959.88$	$0.21 \cdot 3839.5 = 806.30$
2023	4769.83	$\frac{1}{2} \cdot 4769.83 = 2384.92$	$0.7 \cdot 4769.83 = 3338.88$
2024		$SUM = 4349$	$SUM = 4541$

Table 5.5 EMA Models (source: adapted from https://www.nasdaq.com/market-activity/index/spx/historical)

Therefore, we find estimates of 4,349 and 4,541 for the S&P value at the end of 2024, by EMA models with $α = \frac{1}{2}$ and $α = 0.7$ , respectively. Note that larger values of $α$ emphasize the most recent data, while smaller values allow more of the past data points to influence the average. If the time series trend experiences more variability (due to external factors that we have no control over), then an EMA with a relatively large $α$ -value would be more appropriate than one with a small $α$ -value.

EMA models may also serve as smoothing techniques, isolating the trend-cycle component. In this context, it’s better to use an equivalent form of the EMA model:

{\hat{x}}_{n + 1} = α x_{n} + (1 - α) {\hat{x}}_{n}

In this form, we build a series of estimates $({\hat{x}}_{n})$ recursively by incorporating the terms of the original time series one at a time. The idea is that the next estimate, ${\hat{x}}_{n + 1}$ , is found by weighted average of the current data point, $x_{n}$ , and the current estimate, ${\hat{x}}_{n}$ . Note, we may set ${\hat{x}}_{1} = x_{1}$ since we need to begin the process somewhere. It can be shown mathematically that this procedure is essentially the same as working out the sum of exponentially weight terms directly and is much more efficient.

Example 5.8

Problem

Use the recursive EMA formula with $α = 0.75$ to smooth the S&P Index time series (Table 5.1), and then use the EMA model to estimate the value of the S&P Index at the end of the 2024. Plot the EMA together with the original time series.

Solution

With $α = 0.75$ , the recursive EMA formula becomes: ${\hat{x}}_{n + 1} = 0.75 x_{n} + 0.25 {\hat{x}}_{n}$ . Table 5.6 shows the results of the computation.

Year	S&P Index at Year-End	EMA Estimate
2013	$x_{1} = 1848.36$	${\hat{x}}_{1} = x_{1} = 1848.36$
2014	$x_{2} = 2058.9$	${\hat{x}}_{2} = {0.75 x}_{1} + 0.25 {\hat{x}}_{1} = 0.75 (1848.36) + 0.25 (1848.36) = 1848.36$
2015	$x_{3} = 2043.94$	${\hat{x}}_{3} = {0.75 x}_{2} + 0.25 {\hat{x}}_{2} = 0.75 (2058.9) + 0.25 (1848.36) = 2006.27$
2016	$x_{4} = 2238.83$	${\hat{x}}_{4} = {0.75 x}_{3} + 0.25 {\hat{x}}_{3} = 0.75 (2043.94) + 0.25 (2006.27) = 2034.52$
2017	$x_{5} = 2673.61$	${\hat{x}}_{5} = {0.75 x}_{4} + 0.25 {\hat{x}}_{4} = 0.75 (2238.83) + 0.25 (2034.52) = 2187.75$
2018	$x_{6} = 2506.85$	${\hat{x}}_{6} = {0.75 x}_{5} + 0.25 {\hat{x}}_{5} = 0.75 (2673.61) + 0.25 (2187.75) = 2552.15$
2019	$x_{7} = 3230.78$	${\hat{x}}_{7} = {0.75 x}_{6} + 0.25 {\hat{x}}_{6} = 0.75 (2506.85) + 0.25 (2552.15) = 2518.17$
2020	$x_{8} = 3756.07$	${\hat{x}}_{8} = {0.75 x}_{7} + 0.25 {\hat{x}}_{7} = 0.75 (3230.78) + 0.25 (2518.17) = 3052.63$
2021	$x_{9} = 4766.18$	${\hat{x}}_{9} = {0.75 x}_{8} + 0.25 {\hat{x}}_{8} = 0.75 (3756.07) + 0.25 (3052.63) = 3580.21$
2022	$x_{10} = 3839.5$	${\hat{x}}_{10} = {0.75 x}_{9} + 0.25 {\hat{x}}_{9} = 0.75 (4766.18) + 0.25 (3580.21) = 4469.69$
2023	$x_{11} = 4769.83$	${\hat{x}}_{11} = {0.75 x}_{10} + 0.25 {\hat{x}}_{10} = 0.75 (3839.5) + 0.25 (4469.69) = 3997.05$
2024	N/A	${\hat{x}}_{12} = {0.75 x}_{11} + 0.25 {\hat{x}}_{11} = 0.75 (4769.83) + 0.25) 3997.05) = 4576.63$

Table 5.6 EMA Smoothing for the S&P Index Time Series (source: adapted from https://www.nasdaq.com/market-activity/index/spx/historical)

According to this model, the S&P Index will be about 4577 at the end of 2024. Figure 5.19 shows the time series together with the exponential smoothing model.

A line chart titled S&P Index at the End of Year 2013-2023. The X axis has years from 2013 to 2024 and the Y axis ranges from 0 to 6,000. A blue line representing the SP 500 shows a general upward trend over the past decade. An orange line representing the exponential smoothing model follows a similar but slightly delayed track.

Figure 5.19 Exponential Smoothing Model for the S&P Index (data source: adapted from S&P 500 [SPX] Historical Data)

Note, EMA smoothing can be done very easily in Excel. Simply type in the formulas =B2 into cell C2 and =0.75*B2+0.25*C2 into cell C3. Then drag cell C3 down to the end of the data to compute the EMA. You can also do EMA smoothing in Python using the ewm() function applied to a dataset. Note, the “span” value should be set to $\frac{2}{α} - 1$ , as shown in this code.

Python Code

    # Import libraries
    import pandas as pd ## for dataset management
    
    # Creating a dictionary with the given data
    data = {
      'Year': [2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023],
      'SP500': [1848.36, 2058.9, 2043.94, 2238.83, 2673.61, 2506.85, 3230.78, 3756.07, 4766.18, 3839.5, 4769.83]
    }
    # Creating DataFrame
    df = pd.DataFrame(data)
    
    alpha = 0.75
    df['ema']=df['SP500'].ewm(span=2/alpha-1, adjust=False).mean()
    
    print(df['ema'])

The resulting output will look like this:

0   1848.360000
1   2006.265000
2   2034.521250
3   2187.752812
4   2552.145703
5   2518.173926
6   3052.628481
7   3580.209620
8   4469.687405
9   3997.046851
10  4576.634213
Name: ema, dtype: float64

Autoregressive Integrated Moving Average (ARIMA)

In this section, we develop a very powerful tool that combines multiple models together and is able to capture trend and seasonality. The method is called autoregressive integrated moving average, or ARIMA for short, and it consists of three main components:

Autoregressive (AR) model. This component captures patterns in the time series that are repeated, including seasonal components as well as correlation with recent values.
Integrative (I) component. This component represents the differencing operation discussed in the section on detrending a time series.
Moving average (MA) model. Despite the name, this component is not the same as the moving average discussed previously, although there is a mathematical connection. Instead of focusing on averages of existing data points, MA models incorporate the influence of past forecast errors to predict the next value of the time series.

Before getting into the specifics of each component of ARIMA, we need to talk about the term stationary, which refers to a time series in which the variance is relatively constant over time, an overall upward or downward trend cannot be found, and no seasonal patterns exist. White noise is an example of a stationary time series; however, the term stationary is a bit more general, as it includes the possibility of a cyclic component in the data. Recall, a cyclic component (despite its name) is a longer-term rise or fall in the values of a time series that may occur from time to time, but the occurrences are unpredictable.

The AR and MA models work best on stationary time series. The integrative component is the tool that can detrend the data, transforming a nonstationary time series into a stationary one so that AR and/or MA may be used.

Autoregressive Models

The autoregressive models take into account a fixed number of previous values of the time series in order to predict the next value. In this way, AR models are much like WMA models previously discussed, in the sense that the next term of the time series depends on a sum of weighted terms of previous values; however, in the AR model, the terms used are the terms of the model itself.

Suppose we want to model a time series, $(x_{n})$ , using an AR model. First, we should pick a value $p$ , called the order of the model, which specifies how many previous terms to include in the model. Then we may call the model an $AR (p)$ model. If there is a strong seasonal component of period $p$ , then it makes sense to choose that value as the order.

An $AR (p)$ model is a recursive formula of the form

{AR (p)}_{n} = w_{1} y_{n - 1} + w_{2} y_{n - 2} + w_{3} y_{n - 3} + \dots + w_{p} y_{n - p} + ε_{n}

Here, the parameters or weights, $w_{1}, w_{2}, w_{3}, \dots w_{p}$ , are constants that must be chosen so that the values of the time series $(y_{n})$ fit the known data as closely as possible (i.e., $y_{n} \approx x_{n}$ for all known observations). There are also certain restrictions on the possible values of each weight c, but we do not need to delve into those specifics in this text as we will rely on software to do the work of finding the parameters for us.

Integrative Component

The “I” in ARIMA stands for integrative, which is closely related to the word integral in calculus. Now if you have seen some calculus, you might recall that the integral is just the opposite of a derivative. The derivative, in turn, is closely related to the idea of differencing in a time series. Of course, you don’t have to know calculus to understand the idea of an integrative model.

Rather than producing another component of the time series, the integrative component of ARIMA represents the procedure of taking differences, $Δ^{d} (x_{n})$ (up to some specific order $d$ ), until the time series becomes stationary. Often, only the first-order difference is needed, and you can check for stationarity visually. A more sophisticated method, called the Augmented Dickey-Fuller (ADF) test, sets up a test statistic that measures the presence of non-stationarity in a time series. Using ADF, a p-value is computed. If the p-value is below a set threshold (such as $p < 0.01$ ), then the null hypothesis of non-stationarity can be rejected. Next, we will show how to run the ADF test on a time series and its differences $Δ^{d} (x_{n})$ to find out the best value for $d$ in ARIMA.

Then the AR and MA models can be produced on the resulting stationary time series. Integrating (which is the opposite of differencing) builds the model back up so that it can ultimately be used to predict values of the original time series $(x_{n})$ .

Moving Average Models

An MA model of order $q$ , or $MA (q)$ , takes the form

{MA (q)}_{n} = L + ε_{n} + b_{1} ε_{n - 1} + b_{2} ε_{n - 2} + b_{3} ε_{n - 3} + \dots + b_{q} ε_{n - q}

Here, the $ε_{n}$ terms are residuals (forecast errors), and $L$ is the level of the time series (mean value of all the terms). Again, there are a multitude of parameters that need to be chosen in order for the model to make sense. We will rely on software to handle this in general.

ARIMA Model in Python

The full ARIMA model, $ARIMA (p, d, q)$ , incorporates all three components—autoregressive of order $p$ , integrative with d-order differencing, and moving average of order $q$ . For the time series MonthlyCoalConsumption.xlsx, it turns out that the second-order difference is stationary, as shown by a series of ADF tests.

Python Code

    # Import libraries
    import pandas as pd ## for dataset management
    # Import adfuller from the statsmodels library
    from statsmodels.tsa.stattools import adfuller
    
    # Read data
    df = pd.read_excel('MonthlyCoalConsumption.xlsx')
    
    print('d = 0')
    result = adfuller(df['Value'])
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    
    ## Get the first difference
    print('\n d = 1')
    df1 = df['Value'].diff().dropna()
    result = adfuller(df1)
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    
    ## Get the second difference
    print('\n d = 2')
    df2 = df1.diff().dropna()
    result = adfuller(df2)
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])

The resulting output will look like this:

d = 0
ADF Statistic: -1.174289795062662
p-value: 0.6845566772896323

d = 1
ADF Statistic: -1.7838215415905252
p-value: 0.38852291349101137

d = 2
ADF Statistic: -8.578323839498081
p-value: 7.852009937900099e-14

Here, a p-value of $7.852 \times 10^{- 14}$ indicates that the second-order difference is stationary, so we will set $d = 2$ in the ARIMA model. An ACF plot of the second difference can be used to determine $p$ .

Python Code

    # Import libraries
    from statsmodels.graphics.tsaplots import plot_acf
    
    # Plot the ACF of the second difference
    plot_acf(df2, lags=20, title='ACF for d = 2 Difference');

ACF plot labeled ACF for d = 2 Difference with 21 lags showing autocorrelation values for differenced data. The y-axis ranges from -1 to 1 and the x-axis from 0 to 20. Confidence intervals are indicated by shaded areas.

We see a peak at lag 12, corresponding to the expected yearly seasonal variation, so we will set $p = 12$ in the ARIMA model. For this time series, positive values of $q$ did not change the accuracy of the model at all, so we set $q = 0$ (MA model component unnecessary). The model was used to forecast 24 time steps (2 years) into the future. We are using the Python library statsmodels.tsa.arima.model to create the ARIMA model.

Python Code

    # Import ARIMA model from statsmodels.tsa library
    from statsmodels.tsa.arima.model import ARIMA
    import matplotlib.pyplot as plt ## for plotting graphs
    from matplotlib.ticker import MaxNLocator ## for graph formatting
    from matplotlib.ticker import FuncFormatter ## for formatting y-axis
    
    # Fit an ARIMA model
    p = 12; d = 2; q = 0 # ARIMA component orders
    model = ARIMA(df['Value'], order=(p, d, q))
    results = model.fit()
    
    # Make forecasts
    fc = results.forecast(steps=24)
    
    # Plot the results
    plt.figure(figsize=(10, 6))
    
    # Plot original time series
    plt.plot(df['Value'], label='Original Time Series')
    
    # Plot fitted and forecasted values
    plt.plot(results.fittedvalues, color='red', label='Fitted Values')
    plt.plot(fc, color='red', linestyle='dashed', label='Forecasted Values') ## Forecasted values
    
    # Set labels and legend
    plt.xlabel('Months')
    plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
    plt.legend()
    
    # Function to format the Y-axis values
    def y_format(value, tick_number):
      return f'{value:,.0f}'
    
    # Apply the formatter to the Y-axis
    plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
    plt.show()

The resulting output will look like this:

Time series plot titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. Y-axis ranges from 0 to 100,000, x-axis from 0 to 100. Original data, fitted values, and forecasted values are plotted. The blue line represents the actual coal consumption, which fluctuates seasonally. The red line represents the fitted values, which smooth out the seasonal fluctuations, and the dashed red line represents the forecasted values. Coal consumption shows a general downward trend from around 100,000 tons per month in 2016 to around 20,000 tons per month in 2022.

Here, the blue curve is the original time series, and the red curve represents the ARIMA model, with forecasted values indicated by the dashed part of the curve. Although there is considerable error near the beginning of the time period, the model seems to fit the known values of the data fairly well, and so we expect that the future forecasts are reasonable.

5.3 Time Series Forecasting Methods

Learning Outcomes

Detrending the Data and Smoothing Techniques

Simple Moving Averages (SMA)

Problem

Solution

Differencing

Problem

Solution

Problem

Solution

SMA and Differencing in Excel

SMA and Differencing in Python

Analyzing Seasonality

Identifying the Period

Seasonally Adjusted Data

Problem

Solution

Decomposing a Time Series into Components Using Python

Weighted Moving Averages and Exponential Moving Averages

Problem

Solution

Problem

Solution

Autoregressive Integrated Moving Average (ARIMA)

Autoregressive Models

Integrative Component

Moving Average Models

ARIMA Model in Python