Skip to ContentGo to accessibility pageKeyboard shortcuts menu
OpenStax Logo
Principles of Data Science

5.3 Time Series Forecasting Methods

Principles of Data Science5.3 Time Series Forecasting Methods

Learning Outcomes

By the end of this section, you should be able to:

  • 5.3.1 Analyze a time series by decomposing it into its components, including trend, seasonal and cyclic variation, and residual noise.
  • 5.3.2 Compute various kinds of moving averages, weighted moving averages, and related smoothing techniques.
  • 5.3.3 Describe the ARIMA procedure and how autocorrelation functions can be used to find appropriate parameters.
  • 5.3.4 Use ARIMA to make forecasts, with the aid of Python.

This section focuses on analysis of a time series into its components for the purpose of building a robust model for forecasting. Although much of the analysis can be done by hand, numerous computer software applications have been developed to handle this task. Ultimately, we will see how Python can be used to decompose a time series and create a general-purpose model called the autoregressive integrated moving average model (commonly known as ARIMA) to make forecasts.

Detrending the Data and Smoothing Techniques

Detrending a time series refers to the process of separating the underlying trend component tntn from the time series as a whole through one of two complementary operations: isolating the trend and then removing it or smoothing out the trend leaving only the seasonal and error components. Methods that reduce the error term and flatten out seasonal variations result in a trend-cycle component tntn that still closely follows the long-term ups and downs in the time series. In this case, the other components of the time series can be found by removing the trend component:

xntn=sn+εnxntn=sn+εn

On the other hand, there are detrending methods that transform the data into a new time series with a trendline closer to a constant line at the level of 0. In other words, after applying such a transformation, the new time series, znzn, would consist of only the seasonal and noise components:

zn=sn+εnzn=sn+εn

Common methods for isolating or removing the trend or trend-cycle component include the following:

  1. Simple moving average (SMA). The simple moving average (SMA) is found by taking the average (arithmetic mean) of a fixed number of consecutive data points. This has the effect of smoothing out short-term variations in the data and isolating the trend. If the data has seasonal variation, then an SMA of the same length can effectively remove the seasonal component completely—there is more about this in Time Series Forecasting Methods.
  2. Weighted moving average (WMA). There are several variations on the moving average idea, in which previous terms of the series are given different weights, thus termed weighted moving average (WMA). One of the most common weighted moving averages is the exponential moving average (EMA), in which the weights follow exponential decay, giving more recent data points much more weight than those further away in time. These techniques are more often used in forecasting as discussed later in this section, but they may also be employed as a smoothing method to isolate a trend in the data.
  3. Differencing. The (first-order) difference of a time series is found by differencing, or taking differences of consecutive terms—that is, the nth term of the difference would be xn+1xnxn+1xn. If the original time series had a linear trend, then the difference would display a relatively constant trend with only noise and seasonal variation present. Similarly, the difference of the difference, or second-order difference, can transform a quadratic trend into a constant trend. Higher-order polynomial trends can be identified using higher-order differences. Thus, we can use differencing to check for the overall trend of a certain type (linear, quadratic, etc.) while at the same time removing the original trend from the time series.
  4. Regression. As mentioned in Example 5.1, linear regression may be applied to a time series that seems to display a uniform upward or downward trend. Using differencing to identify whether there is a trend of some particular order, then linear, quadratic, or higher-order polynomial regression could be used to find that trend.
  5. Seasonal-Trend decomposition using LOESS (STL). Seasonal-Trend decomposition using LOESS (STL) is a powerful tool that decomposes a time series into the trend-cycle, seasonal, and noise components. Later in this section we will find out how to perform STL using Python. (Note: LOESS is a regression technique the fits a low-order polynomial—typically linear or quadratic—to small portions of the data at a time. A full understanding of LOESS is not required for using STL.)

There are, of course, more complex methods for analyzing and isolating the trends of time series, but in this text we will stick to the basics and make use of technology when appropriate.

Simple Moving Averages (SMA)

Recall that we use the term window to refer to consecutive terms of a time series. In what follows, assume that the size of the window is a constant, TT. Given a time series with values (xn)(xn), that is, a sequence (x1, x2, x3,x4,,xN)(x1,x2,x3,x4,,xN), the simple moving average with window size TT is defined as a new series (yn)(yn) with values given by the formula

yn=xn+xn1+xn2++xnT+1T=1Tk=1TxnT+kyn=xn+xn1+xn2++xnT+1T=1Tk=1TxnT+k

This formula is not as difficult as it looks. Just think of it as the average of the last TT terms. For example, given the sequence (5, 4, 2, 6, 2, 3, 1, 7), the SMA of length 4 would be found by averaging 4 consecutive terms at a time. Note that the first term of the SMA is y4y4 because at least 4 terms are needed in the average.

Term k=4k=4:(5,4,2,6,2,3,1,7)y4=5+4+2+64=174=4.25(5,4,2,6,2,3,1,7)y4=5+4+2+64=174=4.25

Term k=5k=5:(5,4,2,6,2,3,1,7)y5=4+2+6+24=144=3.5(5,4,2,6,2,3,1,7)y5=4+2+6+24=144=3.5

Term k=6k=6: (5,4,2,6,2,3,1,7)y6=2+6+2+34=134=3.25(5,4,2,6,2,3,1,7)y6=2+6+2+34=134=3.25

Term k=7k=7: (5,4,2,6,2,3,1,7)y7=6+2+3+14=124=3(5,4,2,6,2,3,1,7)y7=6+2+3+14=124=3

Term k=8k=8: (5,4,2,6,2,3,1,7)y8=2+3+1+74=134=3.25(5,4,2,6,2,3,1,7)y8=2+3+1+74=134=3.25

Thus, the SMA of length 4 is the sequence (4.25, 3.5, 3.25, 3, 3.25). Note how the values of the SMA do not jump around as much as the values of the original sequence.

The number of terms to use in a simple moving average depends on the dataset and desired outcome. If the intent is to reduce noise, then a small number of terms (say, 3 to 5) may be sufficient. Sometimes inconsistencies in data collection can be mitigated by a judicious use of SMA. For example, suppose data is collected on the number of walk-in customers at the department of motor vehicles, which only operates Monday through Friday. No data is collected on the weekends, while the number of walk-ins on Mondays may be disproportionately high because people have had to wait until Monday to use their services. Here, an SMA of 7 days would smooth out the data, spreading the weekly walk-in data more evenly over all the days of the week.

Example 5.3

Problem

Find the simple moving average of window size 3 for the data from Table 5.1 and compare with the original line graph.

Simple moving averages provide an easy way to identify a trend-cycle curve in the time series. In effect, the SMA can serve as an estimate of the tntn component of the time series. In practice, when using SMA to isolate the trend-cycle component, the terms of the SMA are centered, which means that when computing the terms of the SMA, ykyk, the xkxk term shows up right in the middle of the window, with the same number of terms to the left and to the right.

For example, the SMA found in Example 5.3 may be shifted so that yk=xk1+xk+xk+13yk=xk1+xk+xk+13. This has the effect of shifting the index of the SMA terms back 1 so that the SMA begins at index k=2k=2.

y2=x1+x2+x33=1,848.36+2,058.9+2,043.943=1,983.73y2=x1+x2+x33=1,848.36+2,058.9+2,043.943=1,983.73

This procedure works well when the window size is odd, but when it is even, a common practice is to compute two SMAs in which xkxk is very nearly in the middle of the sequence and then average their results. For example, using the same data as in Example 5.3, if we wanted to compute a centered SMA of length 4, then the first term would be y3y3, and we would find two averages.

x1+x2+x3+x44=1,848.36+2,058.9+2,043.94+2,238.834=2,047.51x1+x2+x3+x44=1,848.36+2,058.9+2,043.94+2,238.834=2,047.51
x2+x3+x4+x54=2,058.9+2,043.94+2,238.83+2,673.614=2,253.82x2+x3+x4+x54=2,058.9+2,043.94+2,238.83+2,673.614=2,253.82
y3=2,047.51+2,253.822=2,150.67y3=2,047.51+2,253.822=2,150.67

The average of the two averages is equivalent to the following more efficient formula. First consider the previous example.

y3=12(x1+x2+x3+x44+x2+x3+x4+x54)=x18+x24+x34+x44+x58y3=12(x1+x2+x3+x44+x2+x3+x4+x54)=x18+x24+x34+x44+x58

This suggests a general pattern for even window sizes. In summary, we have two closely related formulas for centered SMA, depending on whether TT is even or odd.

yk=1T(xk(T1)/2++xk+(T1)/2),forToddyk=1T(xk(T1)/2++xk+(T1)/2),forTodd
yk=xkT/22T+1T2(xkT/2+1++xk+T/21)+xk+T/22T,forTevenyk=xkT/22T+1T2(xkT/2+1++xk+T/21)+xk+T/22T,forTeven

Of course, there are functions in statistical software that can work out the centered SMA automatically for you.

Differencing

In mathematics, the first-order difference of a sequence (xn)=(x1,x2,x3,x4,,xN)(xn)=(x1,x2,x3,x4,,xN) is another sequence, denoted Δ(xk)Δ(xk), consisting of the differences of consecutive terms. That is,

Δ(xn)=(xn+1xn)=(x2x1,x3x2,x4x3,,xNxN1)Δ(xn)=(xn+1xn)=(x2x1,x3x2,x4x3,,xNxN1)

Note that there is one fewer term in the difference Δ(xn)Δ(xn) compared to the original time series (xn)(xn). Differencing has the effect of leveling out a uniform upward or downward trend without affecting other factors such as seasonal variation and noise.

Moreover, the average (arithmetic mean) of the terms of difference series measures of the rate of change of the original time series, much as the slope of the linear regression does. (Indeed, if you have seen some calculus, you might notice that there is a strong connection between differencing and computing the derivative of a continuous function.)

Example 5.4

Problem

Find the first-order difference time series for the data from Table 5.1 and compare with the original line graph. Find the average of the difference series and compare the value to the slope of the linear regression found in Example 5.1.

Differencing can be used to check whether a time series trend curve is essentially linear, quadratic, or polynomial of higher order. First, we need to recall a definition from algebra that is typically applied to polynomial functions but can also be applied to sequences. If a sequence (xn)(xn) is given by a polynomial formula of the form

xn=f(n)=arnr+ar1nr1++a1n+a0xn=f(n)=arnr+ar1nr1++a1n+a0

where rr is a whole number, each of the coefficients akak (for k=1,2,3,,nk=1,2,3,,n) is a constant, and the leading term is nonzero (ar0ar0), then the sequence has order or degree rr. (Order or degree refers to the number of terms or lags used in a model to describe the time series.)

For example, any sequence of the form xn=an+bxn=an+b has order 1 (and may be called linear). A sequence of the form xn=an2+bn+cxn=an2+bn+c has order 2 (and is called quadratic). Quadratic data fits well to a parabola, which would indicate a curving trend in the data.

If a sequence (xn)(xn) has order r>0r>0, then the difference sequence Δ(xn)Δ(xn) has order r1r1. If the original sequence already is of order 0, which means the sequence is a constant sequence (all terms being the same constant), then the difference sequence simply produces another constant sequence (order 0) in which every term is equal to 0. This important result can be proved mathematically, but we will not present the proof in this text.

As an example, consider the sequence (xn)=(2,5,8,11,14,17,20)(xn)=(2,5,8,11,14,17,20), which has order 1, since you can find the values using the formula xn=3n1xn=3n1. The difference sequence is Δ(xn)=(3,3,3,3,3,3)Δ(xn)=(3,3,3,3,3,3), which is a constant sequence, so it has order 0. Notice that taking the difference one more time yields the constant 0 sequence, Δ(Δ(xn))=(0,0,0,0,0)Δ(Δ(xn))=(0,0,0,0,0), and taking further differences would only reduce the number of terms, but each term would remain at the value 0. (For simplicity, we will assume that there are always enough terms in a sequence so that the difference is also a non-empty sequence.)

The second-order difference, Δ2(xn)=Δ(Δ(xn))Δ2(xn)=Δ(Δ(xn)), or difference of the difference, would reduce the order of an rr-order sequence twice to r2r2 if r2r2 or to the constant zero sequence if r1r1. In general, a dd-order difference, defined by Δd(xn)=Δ(Δd1(xn))Δd(xn)=Δ(Δd1(xn)), reduces the order of an rr-order sequence to rdrd if rdrd or to order 0 otherwise. With a little thought, you may realize that we now have a simple procedure to determine the order of a sequence that does not require us to have the formula for xnxn in hand.

If a sequence (xn)(xn) has a well-defined order (in other words, there is a polynomial formula that is a good model for the terms of the sequence, even if we do not know what that formula is yet), then the order rr is the smallest whole number such that the rr-order difference, Δr+1(xn)Δr+1(xn), is equal to either a constant 0 sequence or white noise.

Example 5.5

Problem

Determine the order of the sequence (10, 9, 6, 7, 18, 45, 94, 171) using differencing.

We cannot hope to encounter real-world time series data this is exactly linear, quadratic, or any given degree. Every time series will have some noise in addition to possible seasonal patterns. However, the trend-cycle curve may still be well approximated by a polynomial model of some order.

SMA and Differencing in Excel

Excel can easily do simple moving averages and differencing. The ability to write one equation in a cell and then drag it down so that it works on all the rows of data is a very powerful and convenient feature. By way of example, let’s explore the dataset MonthlyCoalConsumption.xlsx. First, you will need to save the file as an Excel worksheet, as CSV files do not support equations.

Let’s include a centered 13-term window SMA (see Figure 5.12). In cell C8, type “=AVERAGE(B2:B14)” as shown in Figure 5.12. Note: There should be exactly 6 rows above and 6 rows below the center row at which the formula is entered.

A screenshot of an Excel worksheet with two columns of text labeled Month and Value. 13 data points in the Value column (B) are selected and the formula =AVERAGE(B2:B14) is applied in the formula bar.
Figure 5.12 Screenshot of a 13-term Excel Window SMA (Used with permission from Microsoft)

Next, click on the square in the lower right corner of cell C8 and drag it all the way down to cell C173, which is 6 rows before the final data value. Column C now contains the SMA time series for your data, as shown in Figure 5.13.

A screenshot of an Excel worksheet with three columns of text labeled Month, Value, and SMA. Columns A and B have 19 rows of data. Column C, starting in row 8, shows the SMA time series.
Figure 5.13 Screenshot Showing the SMA Time Series in Column C (Used with permission from Microsoft)

You can generate the graph of the time series and the SMA together, which is shown in Figure 5.14. Compare your computed SMA with the trendline created by Excel in Figure 5.7.

A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from January 2016 to May 2022. The Y axis ranges from 0 to 80,000. The blue line represents the actual coal consumption, which fluctuates seasonally with peaks in winter and valleys in summer. The orange line represents the trend of coal consumption over time, showing a general downward trend from around 80,000 tons per month in 2016 to around 40,000 tons per month in 2022.
Figure 5.14 Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022. The SMA of window length 13 has been added to the graph in orange. (data source: https://www.statista.com/statistics/1170053/monthly-coal-energy-consumption-in-the-us/)

Let’s now work out the first-order difference. The idea is very similar. Put the formula for the difference, which is nothing more than “=B3-B2,” into cell D3, as in Figure 5.15.

A screenshot of an Excel worksheet with four columns labeled Month, Value, SMA, and Difference. Cells B2 and B3 are selected and the Difference formula =B3-B2 is applied in the formula bar.
Figure 5.15 Screenshot of Excel Showing Formula for Difference in Column D (Used with permission from Microsoft)

Then drag the formula down to the end of the column. The result is shown in Figure 5.16.

A screenshot of an Excel worksheet with four columns labeled Month, Value, SMA, and Difference. Row D shows that the Difference formula =B3-B2 has been applied to the whole column.
Figure 5.16 Screenshot of Excel Showing Difference in Column D (Used with permission from Microsoft)

If you need to compute the second-order differences, just do the same procedure to find the difference of column D. Higher-order differences can easily be computed in the analogous way.

SMA and Differencing in Python

Computing an SMA in Python takes only a single line of code. Here is how it works on the Coal Consumption dataset (found in MonthlyCoalConsumption.xlsx), using a window size of 12. (Before, we used a window of 13 because odd-length windows are easier to work with than even-length ones. However, Python easily handles all cases automatically.) The key here is the function rolling() that creates a DataFrame of rolling windows of specified length. Then the mean() command computes the average of each window of data. Computing the first-order difference is just as easy, using the diff() command.

Python Code

    
    # Import libraries
    import pandas as pd # for dataset management
    import matplotlib.pyplot as plt # for visualizations
    from matplotlib.ticker import MaxNLocator, FuncFormatter # for graph formatting and y-axis formatting
    import matplotlib.dates as mdates # for date formatting
    
    # Read the Excel file into a Pandas DataFrame
    df = pd.read_excel('MonthlyCoalConsumption.xlsx')
    
    # Convert 'Month' column to datetime format
    df['Month'] = pd.to_datetime(df['Month'], format='%m/%d/%Y')
    
    # Calculate a simple moving average with a window size of 12
    df['SMA'] = df['Value'].rolling(window=12).mean()
    
    # Calculate the first-order difference
    df['Diff'] = df['Value'].diff()
    
    # Function to format the Y-axis values
    def y_format(value, tick_number):
      return f'{int(round(value)):,}'
    
    # Create the plot
    plt.figure(figsize=(12, 6))
    
    # Plot the time series, SMA, and difference
    plt.plot(df['Month'], df['Value'], marker='', linestyle='-', color='b', label='Value')
    plt.plot(df['Month'], df['SMA'], marker='', linestyle='-', color='r', label='SMA')
    plt.plot(df['Month'], df['Diff'], marker='', linestyle='-', color='g', label='Diff')
    
    # Set the x-axis limits to start from the first date and end at the maximum date in the data
    plt.xlim(pd.to_datetime('01/01/2016', format='%m/%d/%Y'), df['Month'].max())
    
    # Format the x-axis to show dates as MM/DD/YYYY and set the locator for a fixed number of ticks
    ax = plt.gca()
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
    ax.xaxis.set_major_locator(MaxNLocator(nbins=12)) # Set the number of ticks to a reasonable value
    
    # Manually set the rotation of date labels to 45 degrees
    plt.xticks(rotation=45)
    
    # Title and labels
    plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
    plt.xlabel('Month')
    plt.legend()
    
    # Apply the formatter to the Y-axis
    ax.yaxis.set_major_formatter(FuncFormatter(y_format))
    
    plt.tight_layout() # Adjust layout to ensure labels fit well
    plt.show() 
    

The resulting output will look like this:

A screenshot of a line graph Python output labeled A line chart titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. The X axis has months from July 2016 to January 2022. The Y axis ranges from -20,000 to 60,000. The blue line represents the actual coal consumption, which fluctuates seasonally. The red line represents the 3-month simple moving average (SMA) of coal consumption, which smooths out the seasonal fluctuations. The green line represents the difference between the actual consumption and the 3-month SMA, highlighting the seasonal variations. Coal consumption shows a general downward trend from around 80,000 tons per month in 2016 to around 40,000 tons per month in 2022.

Analyzing Seasonality

Once the time series has been detrended using one of the methods already discussed (or other, more sophisticated methods), what is left over should represent only the seasonality, or seasonal variation, snsn, and error term, εnεn.

zn=sn+εnzn=sn+εn

How could we analyze and model snsn? Suppose there were no error term at all (εn=0)(εn=0) and so we are looking at a time series with pure seasonal variation and no other factors. In this case, the component snsn would look like a repeating pattern, such as the pattern shown in Figure 5.17. As mentioned in Components of Time Series Analysis, the time interval that it takes before the pattern completes once and begins to repeat is called the period. In a purely seasonal component, we would expect sn+P=snsn+P=sn, where PP is the period, for all values of nn.

A line chart. The X axis ranges from 1 to 225. The Y axis ranges from -5 to 5. A blue line shows ups and downs representing a pure seasonal variation. The same pattern repeats after 64 time steps.
Figure 5.17 Pure Seasonal Variation. The same pattern repeats after 64 time steps, and so the period is 64.

While it is easy to spot the repeating pattern in a time series that does not have noise, the same cannot be said for the general case. There are several tools available for isolating the seasonal component of a detrended time series.

Identifying the Period

First we need to identify the period of the seasonal data. The simplest method is to use prior knowledge about the time series data, along with a visual inspection. Often, a seasonal pattern is exactly that—a pattern that repeats over the course of one year. Thus, the period would be P=1P=1 year (or 12 months, 52 weeks, 365 days, etc., depending on the time steps of your time series). For example, even without detrending the data, it was easy to identify a period of 4 quarters (1 year) in the data in Example 5.2.

If repetition of a seasonal pattern is not apparent from the data, then it may be discovered using standard statistical measures such as autocorrelation. Autocorrelation is the measure of correlation between a time series and the same time series shifted by some number of time steps. (Correlation and Linear Regression Analysis discussed correlation between two sets of data. Autocorrelation is the same idea applied to the same dataset with a shifted copy of itself.) The size of the shift is called the lag.

The autocorrelation function (ACF) measures the correlation at each lag starting with 0 or 1. Note, lag 0 always produces an autocorrelation of 1 since the time series is being compared to an exact copy of itself. Thus, we are only interested in the correlation coefficient at positive lags. However, small lags generally show high correlation with the original time series as well, simply because data measured within shorter time periods are generally similar to each other. So, even if the ACF shows relatively large correlation at lags of 1, 2, 3, etc., there may not be a true seasonal component present. The positive lag corresponding to the first significant peak, or autocorrelation value that is much larger than the previous values, may indicate a seasonal component with that period.

Later, we will see how to compute ACF of a time series using Python, but first let’s explore further topics regarding seasonal components.

Seasonally Adjusted Data

Once you have determined the period of the seasonal component, you can then extract snsn from the time series. As usual, there are a number of different ways to proceed.

Let’s assume that you have detrended the time series so that what remains consists of a seasonal component and noise term:

zn=xntn=sn+εnzn=xntn=sn+εn

Moreover, suppose that the period of snsn is PP. Of course, with the noise term still present in the time series, we cannot expect that sn+P=snsn+P=sn holds, but it should be the case that sn+Psnsn+Psn for every nn. In other words, the data should look approximately the same when shifted by PP time steps. Our job now is to isolate one complete period of data that models the seasonal variation throughout the time series. In other words, we want to produce an estimate s^ns^n that is exactly periodic (s^n+P=s^ns^n+P=s^n, for all nn) and models the underlying seasonal component as closely as possible.

For any given index n<Pn<P, we expect the data points (zn,zn+P,zn+2P,zn+3P,)(zn,zn+P,zn+2P,zn+3P,) to hover around the same value or, in statistical language, to have rather low variance (much lower than the time series as a whole, that is). So it seems reasonable to use the mean of this slice of data to define the value of s^ns^n. For simplicity, assume that the time series has exactly M·PM·P data points (if not, then the data can be padded in an appropriate way). Then the estimate for the seasonal component would be defined for 1n<P1n<P by the formula

s^n=1M(zn+zn+P+zn+2P+zn+3P++zn+(M1)P)s^n=1M(zn+zn+P+zn+2P+zn+3P++zn+(M1)P)

As usual, the formula looks more intimidating than the concept that it’s based upon. As previously mentioned, this represents nothing more than the average of all the values of the time series that occur every PP time steps.

Once s^ns^n is computed in the range 1n<P1n<P, extend it by repeating the same values in each period until you reach the last index of the original series (n=MP)(n=MP). (Note: The noise term may now be estimated as the series of residuals, εnsns^nεnsns^n, which will occupy our interest in Forecast Evaluation Methods.)

Finally, the estimate s^ns^n can then be used in conjunction with the original time series (xn)(xn) to smooth out the seasonal variation. Note that s^ns^n essentially measures the fluctuations of observed data xnxn above or below the trend. We expect the mean of s^ns^n to be close to zero in any window of time. Define the seasonally adjusted data as the time series

xns^nxns^n

A seasonally adjusted time series is closely related to the underlying trend-cycle curve, tntn, though generally it contains the noise and other random factors that tntn does not generally include.

Example 5.6

Problem

The revenue data from Walt’s Water Adventures (see Table 5.2) seems to display seasonality of period 4. Using the level as the trend, detrend the data, compute the seasonal component s^ns^n, and find the seasonally adjusted time series.

Decomposing a Time Series into Components Using Python

There is a powerful Python library known as statsmodels that collects many useful statistical functions together. We will use the STL module to decompose the time series MonthlyCoalConsumption.xlsx into its components. First, we plot an ACF (autocorrelation function) to determine the best value for periodicity.

Python Code

    # Import libraries
    from statsmodels.graphics.tsaplots import plot_acf
    
    # Read the Excel file into a Pandas Dataframe
    df = pd.read_excel('MonthlyCoalConsumption.xlsx')
    
    # Plot the ACF
    plot_acf(df['Value'], lags=20, title='ACF');
    

The resulting output will look like this:

ACF plot with 21 lags. The y-axis ranges from -1 to 1 and the x-axis from 0 to 20. The plot shows autocorrelation values for each lag, with confidence intervals indicated by shaded areas.

Points that are within the blue shaded area correspond to lags at which the autocorrelation is statistically insignificant. The lags that show the most significance are at 0, 1, 6, and 12. Lag=0Lag=0 always gives 100% autocorrelation because we are comparing the unshifted series to itself. At lag=1lag=1, a high autocorrelation corresponds to the fact that consecutive data points are relatively close to each other (which is also to be expected in most datasets). Of special note are lags at 6 and 12, corresponding to a possible seasonal period of 6 or 12 in the data. We will use period 12 in our STL decomposition.

Python Code

    # Import libraries
    from statsmodels.tsa.seasonal import STL
    import matplotlib.pyplot as plt ## for data visualization
    
    # Perform seasonal decomposition using STL
    stl = STL(df['Value'], period=12)
    result = stl.fit()
    
    # Plot components and format the y-axis labels
    fig = result.plot()
    for ax in fig.axes:
      ax.yaxis.set_major_formatter(FuncFormatter(y_format))
    
    plt.show()
    

The resulting output will look like this:

 
A time series decomposition plot labeled Value with four subplots: original data is on the X axis (0 to 75). On the Y axis are trend (40,000 to 60,000), seasonal component (-10,000 to 10,000), and residuals (-5,000 to 5,000).

The trend, seasonal variation, and residual (noise) components are all graphed in separate plots. To see them all on the same set of axes, we can use matplotlib (noise omitted).

Python Code

    import matplotlib.pyplot as plt ## for data visualization
    
    # Extract the trend and seasonal components
    trend = result.trend
    seasonal = result.seasonal
    
    # Plot the original data, trend, and seasonally adjusted data
    plt.plot(df['Value'], label='Original Data')
    plt.plot(trend, label='Trend')
    plt.plot(seasonal, label='Seasonal')
    plt.legend()
    
    # Apply the formatter to the Y-axis
    plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
    plt.show()
    

The resulting output will look like this:

A time series plot showing original data, trend, and seasonal components. Y-axis ranges from 0 to 70,000, x-axis from 0 to 80. Trend line is sloping downward and seasonal component fluctuates around zero.

Weighted Moving Averages and Exponential Moving Averages

Simple moving averages were discussed in detail at the beginning of this section. Now we expand the discussion to include more sophisticated weighted moving averages and their uses in smoothing data and forecasting. Recall from our discussion in Time Series Forecasting Methods that weighted moving averages are moving averages in which terms are given weights according to some formula or rule.

Suppose that a time series (xn)(xn) has no significant seasonal component (or the seasonal component has been removed so that remaining series is seasonally adjusted). Thus, the series has only trend-cycle and noise components:

xn=tn+εnxn=tn+εn

As already discussed, a simple moving average (SMA) has the effect of smoothing out the data and providing an easy method for predicting the next term of the sequence. For reference, here is the SMA prediction model, with a window of TT terms:

x^n+1=xn+xn1+xn2++xnT+1Tx^n+1=xn+xn1+xn2++xnT+1T

This model may be regarded as a weighted moving average model (WMA) in which the weights of the most recent TT terms are all equal to 1T1T, while all other weights are set to 0. For example, in an SMA model with T=3T=3, the weights would be 13,13,13,0,0,0,0,13,13,13,0,0,0,0,

x^n+1=xn+xn1+xn23=13xn+13xn1+13xn2+0xn3+0xn4+x^n+1=xn+xn1+xn23=13xn+13xn1+13xn2+0xn3+0xn4+

Even the naïve model, x^n+1=xnx^n+1=xn, counts as a WMA, with weights 1, 0, 0, …

In principle, a weighted moving average model may be built using any coefficients, giving a general form for any WMA:

x^n+1=anxn+an1xn1+an2xn2+an3xn3+x^n+1=anxn+an1xn1+an2xn2+an3xn3+

Of course, if we want the model to do a good job in forecasting new values of the time series, then the coefficients must be chosen wisely. Typically, the weights must be normalized, which means that they sum to 1.

an+an1+an2+an3+=1an+an1+an2+an3+=1

However, there is no hard-and-fast requirement that the sum must be equal to 1.

Rather than giving the same weight to some number of previous terms as the SMA model does, you might choose to give greater weight to more recent terms. For example, one might use powers of 1212 as weights.

x^n+1=12xn+14xn1+18xn2+116xn3+x^n+1=12xn+14xn1+18xn2+116xn3+

This method allows the most recent term to dominate the sum, and each successive term contributes half as much in turn. (Note: The sequence 1212, 1414, 1818, 116,116, would be a normalized sequence if infinitely many terms were included, but even a small finite number of terms, like 5 or so, adds up to a sum that is pretty close to 1.)

This is an example of an exponential moving average, or EMA. The most general formula for the EMA model is shown. (Note that the previous example corresponds to an EMA model with α=12α=12).

x^n+1=αxn+α(1α)xn1+α(1α)2xn2+α(1α)3xn3+x^n+1=αxn+α(1α)xn1+α(1α)2xn2+α(1α)3xn3+

Example 5.7

Problem

Use EMA models with α=12α=12 (equivalently, 0.5) and α=0.7α=0.7 for the S&P Index data from Table 5.1 to estimate the value of the S&P Index at the end of year 2023.

EMA models may also serve as smoothing techniques, isolating the trend-cycle component. In this context, it’s better to use an equivalent form of the EMA model:

x^n+1=αxn+(1α)x^nx^n+1=αxn+(1α)x^n

In this form, we build a series of estimates (x^n)(x^n) recursively by incorporating the terms of the original time series one at a time. The idea is that the next estimate, x^n+1x^n+1, is found by weighted average of the current data point, xnxn, and the current estimate, x^nx^n. Note, we may set x^1=x1x^1=x1 since we need to begin the process somewhere. It can be shown mathematically that this procedure is essentially the same as working out the sum of exponentially weight terms directly and is much more efficient.

Example 5.8

Problem

Use the recursive EMA formula with α=0.75α=0.75 to smooth the S&P Index time series (Table 5.1), and then use the EMA model to estimate the value of the S&P Index at the end of the 2024. Plot the EMA together with the original time series.

Autoregressive Integrated Moving Average (ARIMA)

In this section, we develop a very powerful tool that combines multiple models together and is able to capture trend and seasonality. The method is called autoregressive integrated moving average, or ARIMA for short, and it consists of three main components:

  1. Autoregressive (AR) model. This component captures patterns in the time series that are repeated, including seasonal components as well as correlation with recent values.
  2. Integrative (I) component. This component represents the differencing operation discussed in the section on detrending a time series.
  3. Moving average (MA) model. Despite the name, this component is not the same as the moving average discussed previously, although there is a mathematical connection. Instead of focusing on averages of existing data points, MA models incorporate the influence of past forecast errors to predict the next value of the time series.

Before getting into the specifics of each component of ARIMA, we need to talk about the term stationary, which refers to a time series in which the variance is relatively constant over time, an overall upward or downward trend cannot be found, and no seasonal patterns exist. White noise is an example of a stationary time series; however, the term stationary is a bit more general, as it includes the possibility of a cyclic component in the data. Recall, a cyclic component (despite its name) is a longer-term rise or fall in the values of a time series that may occur from time to time, but the occurrences are unpredictable.

The AR and MA models work best on stationary time series. The integrative component is the tool that can detrend the data, transforming a nonstationary time series into a stationary one so that AR and/or MA may be used.

Autoregressive Models

The autoregressive models take into account a fixed number of previous values of the time series in order to predict the next value. In this way, AR models are much like WMA models previously discussed, in the sense that the next term of the time series depends on a sum of weighted terms of previous values; however, in the AR model, the terms used are the terms of the model itself.

Suppose we want to model a time series, (xn)(xn), using an AR model. First, we should pick a value pp, called the order of the model, which specifies how many previous terms to include in the model. Then we may call the model an AR(p)AR(p) model. If there is a strong seasonal component of periodpp, then it makes sense to choose that value as the order.

An AR(p)AR(p) model is a recursive formula of the form

AR(p)n=w1yn1+w2yn2+w3yn3++wpynp+εnAR(p)n=w1yn1+w2yn2+w3yn3++wpynp+εn

Here, the parameters or weights, w1,w2,w3,wpw1,w2,w3,wp, are constants that must be chosen so that the values of the time series (yn)(yn) fit the known data as closely as possible (i.e., ynxnynxn for all known observations). There are also certain restrictions on the possible values of each weight c, but we do not need to delve into those specifics in this text as we will rely on software to do the work of finding the parameters for us.

Integrative Component

The “I” in ARIMA stands for integrative, which is closely related to the word integral in calculus. Now if you have seen some calculus, you might recall that the integral is just the opposite of a derivative. The derivative, in turn, is closely related to the idea of differencing in a time series. Of course, you don’t have to know calculus to understand the idea of an integrative model.

Rather than producing another component of the time series, the integrative component of ARIMA represents the procedure of taking differences, Δd(xn)Δd(xn) (up to some specific order dd), until the time series becomes stationary. Often, only the first-order difference is needed, and you can check for stationarity visually. A more sophisticated method, called the Augmented Dickey-Fuller (ADF) test, sets up a test statistic that measures the presence of non-stationarity in a time series. Using ADF, a p-value is computed. If the p-value is below a set threshold (such as p<0.01p<0.01), then the null hypothesis of non-stationarity can be rejected. Next, we will show how to run the ADF test on a time series and its differences Δd(xn)Δd(xn) to find out the best value for dd in ARIMA.

Then the AR and MA models can be produced on the resulting stationary time series. Integrating (which is the opposite of differencing) builds the model back up so that it can ultimately be used to predict values of the original time series (xn)(xn).

Moving Average Models

An MA model of order qq, or MA(q)MA(q), takes the form

MA(q)n=L+εn+b1εn1+b2εn2+b3εn3++bqεnqMA(q)n=L+εn+b1εn1+b2εn2+b3εn3++bqεnq

Here, the εnεn terms are residuals (forecast errors), and LL is the level of the time series (mean value of all the terms). Again, there are a multitude of parameters that need to be chosen in order for the model to make sense. We will rely on software to handle this in general.

ARIMA Model in Python

The full ARIMA model, ARIMA(p,d,q)ARIMA(p,d,q), incorporates all three components—autoregressive of order pp, integrative with d-order differencing, and moving average of order qq. For the time series MonthlyCoalConsumption.xlsx, it turns out that the second-order difference is stationary, as shown by a series of ADF tests.

Python Code

    # Import libraries
    import pandas as pd ## for dataset management
    # Import adfuller from the statsmodels library
    from statsmodels.tsa.stattools import adfuller
    
    # Read data
    df = pd.read_excel('MonthlyCoalConsumption.xlsx')
    
    print('d = 0')
    result = adfuller(df['Value'])
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    
    ## Get the first difference
    print('\n d = 1')
    df1 = df['Value'].diff().dropna()
    result = adfuller(df1)
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    
    ## Get the second difference
    print('\n d = 2')
    df2 = df1.diff().dropna()
    result = adfuller(df2)
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    

The resulting output will look like this:

d = 0
ADF Statistic: -1.174289795062662
p-value: 0.6845566772896323

d = 1
ADF Statistic: -1.7838215415905252
p-value: 0.38852291349101137

d = 2
ADF Statistic: -8.578323839498081
p-value: 7.852009937900099e-14

Here, a p-value of 7.852×10147.852×1014 indicates that the second-order difference is stationary, so we will set d=2d=2 in the ARIMA model. An ACF plot of the second difference can be used to determine pp.

Python Code

    # Import libraries
    from statsmodels.graphics.tsaplots import plot_acf
    
    # Plot the ACF of the second difference
    plot_acf(df2, lags=20, title='ACF for d = 2 Difference');
     
ACF plot labeled ACF for d = 2 Difference with 21 lags showing autocorrelation values for differenced data. The y-axis ranges from -1 to 1 and the x-axis from 0 to 20. Confidence intervals are indicated by shaded areas.

We see a peak at lag 12, corresponding to the expected yearly seasonal variation, so we will set p=12p=12 in the ARIMA model. For this time series, positive values of qq did not change the accuracy of the model at all, so we set q=0q=0 (MA model component unnecessary). The model was used to forecast 24 time steps (2 years) into the future. We are using the Python library statsmodels.tsa.arima.model to create the ARIMA model.

Python Code

    # Import ARIMA model from statsmodels.tsa library
    from statsmodels.tsa.arima.model import ARIMA
    import matplotlib.pyplot as plt ## for plotting graphs
    from matplotlib.ticker import MaxNLocator ## for graph formatting
    from matplotlib.ticker import FuncFormatter ## for formatting y-axis
    
    # Fit an ARIMA model
    p = 12; d = 2; q = 0 # ARIMA component orders
    model = ARIMA(df['Value'], order=(p, d, q))
    results = model.fit()
    
    # Make forecasts
    fc = results.forecast(steps=24)
    
    # Plot the results
    plt.figure(figsize=(10, 6))
    
    # Plot original time series
    plt.plot(df['Value'], label='Original Time Series')
    
    # Plot fitted and forecasted values
    plt.plot(results.fittedvalues, color='red', label='Fitted Values')
    plt.plot(fc, color='red', linestyle='dashed', label='Forecasted Values') ## Forecasted values
    
    # Set labels and legend
    plt.xlabel('Months')
    plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
    plt.legend()
    
    # Function to format the Y-axis values
    def y_format(value, tick_number):
      return f'{value:,.0f}'
    
    # Apply the formatter to the Y-axis
    plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
    plt.show() 
    

The resulting output will look like this:

Time series plot titled Monthly consumption of coal for electricity generation in the United States from 2016 to 2022. Y-axis ranges from 0 to 100,000, x-axis from 0 to 100. Original data, fitted values, and forecasted values are plotted. The blue line represents the actual coal consumption, which fluctuates seasonally. The red line represents the fitted values, which smooth out the seasonal fluctuations, and the dashed red line represents the forecasted values. Coal consumption shows a general downward trend from around 100,000 tons per month in 2016 to around 20,000 tons per month in 2022.

Here, the blue curve is the original time series, and the red curve represents the ARIMA model, with forecasted values indicated by the dashed part of the curve. Although there is considerable error near the beginning of the time period, the model seems to fit the known values of the data fairly well, and so we expect that the future forecasts are reasonable.

Citation/Attribution

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution-NonCommercial-ShareAlike License and you must attribute OpenStax.

Attribution information
  • If you are redistributing all or part of this book in a print format, then you must include on every physical page the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
  • If you are redistributing all or part of this book in a digital format, then you must include on every digital page view the following attribution:
    Access for free at https://openstax.org/books/principles-data-science/pages/1-introduction
Citation information

© Dec 19, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.