Learning Outcomes
By the end of this section, you should be able to:
- 5.3.1 Analyze a time series by decomposing it into its components, including trend, seasonal and cyclic variation, and residual noise.
- 5.3.2 Compute various kinds of moving averages, weighted moving averages, and related smoothing techniques.
- 5.3.3 Describe the ARIMA procedure and how autocorrelation functions can be used to find appropriate parameters.
- 5.3.4 Use ARIMA to make forecasts, with the aid of Python.
This section focuses on analysis of a time series into its components for the purpose of building a robust model for forecasting. Although much of the analysis can be done by hand, numerous computer software applications have been developed to handle this task. Ultimately, we will see how Python can be used to decompose a time series and create a general-purpose model called the autoregressive integrated moving average model (commonly known as ARIMA) to make forecasts.
Detrending the Data and Smoothing Techniques
Detrending a time series refers to the process of separating the underlying trend component from the time series as a whole through one of two complementary operations: isolating the trend and then removing it or smoothing out the trend leaving only the seasonal and error components. Methods that reduce the error term and flatten out seasonal variations result in a trend-cycle component that still closely follows the long-term ups and downs in the time series. In this case, the other components of the time series can be found by removing the trend component:
On the other hand, there are detrending methods that transform the data into a new time series with a trendline closer to a constant line at the level of 0. In other words, after applying such a transformation, the new time series, , would consist of only the seasonal and noise components:
Common methods for isolating or removing the trend or trend-cycle component include the following:
- Simple moving average (SMA). The simple moving average (SMA) is found by taking the average (arithmetic mean) of a fixed number of consecutive data points. This has the effect of smoothing out short-term variations in the data and isolating the trend. If the data has seasonal variation, then an SMA of the same length can effectively remove the seasonal component completely—there is more about this in Time Series Forecasting Methods.
- Weighted moving average (WMA). There are several variations on the moving average idea, in which previous terms of the series are given different weights, thus termed weighted moving average (WMA). One of the most common weighted moving averages is the exponential moving average (EMA), in which the weights follow exponential decay, giving more recent data points much more weight than those further away in time. These techniques are more often used in forecasting as discussed later in this section, but they may also be employed as a smoothing method to isolate a trend in the data.
- Differencing. The (first-order) difference of a time series is found by differencing, or taking differences of consecutive terms—that is, the nth term of the difference would be . If the original time series had a linear trend, then the difference would display a relatively constant trend with only noise and seasonal variation present. Similarly, the difference of the difference, or second-order difference, can transform a quadratic trend into a constant trend. Higher-order polynomial trends can be identified using higher-order differences. Thus, we can use differencing to check for the overall trend of a certain type (linear, quadratic, etc.) while at the same time removing the original trend from the time series.
- Regression. As mentioned in Example 5.1, linear regression may be applied to a time series that seems to display a uniform upward or downward trend. Using differencing to identify whether there is a trend of some particular order, then linear, quadratic, or higher-order polynomial regression could be used to find that trend.
- Seasonal-Trend decomposition using LOESS (STL). Seasonal-Trend decomposition using LOESS (STL) is a powerful tool that decomposes a time series into the trend-cycle, seasonal, and noise components. Later in this section we will find out how to perform STL using Python. (Note: LOESS is a regression technique the fits a low-order polynomial—typically linear or quadratic—to small portions of the data at a time. A full understanding of LOESS is not required for using STL.)
There are, of course, more complex methods for analyzing and isolating the trends of time series, but in this text we will stick to the basics and make use of technology when appropriate.
Simple Moving Averages (SMA)
Recall that we use the term window to refer to consecutive terms of a time series. In what follows, assume that the size of the window is a constant, . Given a time series with values , that is, a sequence , the simple moving average with window size is defined as a new series with values given by the formula
This formula is not as difficult as it looks. Just think of it as the average of the last terms. For example, given the sequence (5, 4, 2, 6, 2, 3, 1, 7), the SMA of length 4 would be found by averaging 4 consecutive terms at a time. Note that the first term of the SMA is because at least 4 terms are needed in the average.
Term :
Term :
Term :
Term :
Term :
Thus, the SMA of length 4 is the sequence (4.25, 3.5, 3.25, 3, 3.25). Note how the values of the SMA do not jump around as much as the values of the original sequence.
The number of terms to use in a simple moving average depends on the dataset and desired outcome. If the intent is to reduce noise, then a small number of terms (say, 3 to 5) may be sufficient. Sometimes inconsistencies in data collection can be mitigated by a judicious use of SMA. For example, suppose data is collected on the number of walk-in customers at the department of motor vehicles, which only operates Monday through Friday. No data is collected on the weekends, while the number of walk-ins on Mondays may be disproportionately high because people have had to wait until Monday to use their services. Here, an SMA of 7 days would smooth out the data, spreading the weekly walk-in data more evenly over all the days of the week.
Example 5.3
Problem
Find the simple moving average of window size 3 for the data from Table 5.1 and compare with the original line graph.
Solution
The SMA will begin at index .
The SMA is graphed together with the original time series in Figure 5.10.
Simple moving averages provide an easy way to identify a trend-cycle curve in the time series. In effect, the SMA can serve as an estimate of the component of the time series. In practice, when using SMA to isolate the trend-cycle component, the terms of the SMA are centered, which means that when computing the terms of the SMA, , the term shows up right in the middle of the window, with the same number of terms to the left and to the right.
For example, the SMA found in Example 5.3 may be shifted so that . This has the effect of shifting the index of the SMA terms back 1 so that the SMA begins at index .
This procedure works well when the window size is odd, but when it is even, a common practice is to compute two SMAs in which is very nearly in the middle of the sequence and then average their results. For example, using the same data as in Example 5.3, if we wanted to compute a centered SMA of length 4, then the first term would be , and we would find two averages.
The average of the two averages is equivalent to the following more efficient formula. First consider the previous example.
This suggests a general pattern for even window sizes. In summary, we have two closely related formulas for centered SMA, depending on whether is even or odd.
Of course, there are functions in statistical software that can work out the centered SMA automatically for you.
Differencing
In mathematics, the first-order difference of a sequence is another sequence, denoted , consisting of the differences of consecutive terms. That is,
Note that there is one fewer term in the difference compared to the original time series . Differencing has the effect of leveling out a uniform upward or downward trend without affecting other factors such as seasonal variation and noise.
Moreover, the average (arithmetic mean) of the terms of difference series measures of the rate of change of the original time series, much as the slope of the linear regression does. (Indeed, if you have seen some calculus, you might notice that there is a strong connection between differencing and computing the derivative of a continuous function.)
Example 5.4
Problem
Find the first-order difference time series for the data from Table 5.1 and compare with the original line graph. Find the average of the difference series and compare the value to the slope of the linear regression found in Example 5.1.
Solution
Since the original time series from Table 5.1 has terms, the difference will have only 9 terms, as shown in Table 5.3.
Term n | S&P Index Term | Term of |
---|---|---|
1 | 1848.36 | N/A |
2 | 2058.9 | |
3 | 2043.94 | |
4 | 2238.83 | |
5 | 2673.61 | |
6 | 2506.85 | |
7 | 3230.78 | |
8 | 3756.07 | |
9 | 4766.18 | |
10 | 3839.5 | |
11 | 4769.83 |
Figure 5.11 shows the graph of the original data together with its difference series.
The average value of is:
Comparing with the slope of the linear regression, which is about 304 (recall Example 5.1), we find the two values are relatively close to each other.
Differencing can be used to check whether a time series trend curve is essentially linear, quadratic, or polynomial of higher order. First, we need to recall a definition from algebra that is typically applied to polynomial functions but can also be applied to sequences. If a sequence is given by a polynomial formula of the form
where is a whole number, each of the coefficients (for ) is a constant, and the leading term is nonzero (), then the sequence has order or degree . (Order or degree refers to the number of terms or lags used in a model to describe the time series.)
For example, any sequence of the form has order 1 (and may be called linear). A sequence of the form has order 2 (and is called quadratic). Quadratic data fits well to a parabola, which would indicate a curving trend in the data.
If a sequence has order , then the difference sequence has order . If the original sequence already is of order 0, which means the sequence is a constant sequence (all terms being the same constant), then the difference sequence simply produces another constant sequence (order 0) in which every term is equal to 0. This important result can be proved mathematically, but we will not present the proof in this text.
As an example, consider the sequence , which has order 1, since you can find the values using the formula . The difference sequence is , which is a constant sequence, so it has order 0. Notice that taking the difference one more time yields the constant 0 sequence, , and taking further differences would only reduce the number of terms, but each term would remain at the value 0. (For simplicity, we will assume that there are always enough terms in a sequence so that the difference is also a non-empty sequence.)
The second-order difference, , or difference of the difference, would reduce the order of an -order sequence twice to if or to the constant zero sequence if . In general, a -order difference, defined by , reduces the order of an -order sequence to if or to order 0 otherwise. With a little thought, you may realize that we now have a simple procedure to determine the order of a sequence that does not require us to have the formula for in hand.
If a sequence has a well-defined order (in other words, there is a polynomial formula that is a good model for the terms of the sequence, even if we do not know what that formula is yet), then the order is the smallest whole number such that the -order difference, , is equal to either a constant 0 sequence or white noise.
Example 5.5
Problem
Determine the order of the sequence (10, 9, 6, 7, 18, 45, 94, 171) using differencing.
Solution
Let stand for the given sequence.
First-order difference:
Second-order difference:
Third-order difference:
Fourth-order difference:
The given sequence has order 3 since the -order difference resulted in all zeros.
We cannot hope to encounter real-world time series data this is exactly linear, quadratic, or any given degree. Every time series will have some noise in addition to possible seasonal patterns. However, the trend-cycle curve may still be well approximated by a polynomial model of some order.
SMA and Differencing in Excel
Excel can easily do simple moving averages and differencing. The ability to write one equation in a cell and then drag it down so that it works on all the rows of data is a very powerful and convenient feature. By way of example, let’s explore the dataset MonthlyCoalConsumption.xlsx. First, you will need to save the file as an Excel worksheet, as CSV files do not support equations.
Let’s include a centered 13-term window SMA (see Figure 5.12). In cell C8, type “=AVERAGE(B2:B14)” as shown in Figure 5.12. Note: There should be exactly 6 rows above and 6 rows below the center row at which the formula is entered.
Next, click on the square in the lower right corner of cell C8 and drag it all the way down to cell C173, which is 6 rows before the final data value. Column C now contains the SMA time series for your data, as shown in Figure 5.13.
You can generate the graph of the time series and the SMA together, which is shown in Figure 5.14. Compare your computed SMA with the trendline created by Excel in Figure 5.7.
Let’s now work out the first-order difference. The idea is very similar. Put the formula for the difference, which is nothing more than “=B3-B2,” into cell D3, as in Figure 5.15.
Then drag the formula down to the end of the column. The result is shown in Figure 5.16.
If you need to compute the second-order differences, just do the same procedure to find the difference of column D. Higher-order differences can easily be computed in the analogous way.
SMA and Differencing in Python
Computing an SMA in Python takes only a single line of code. Here is how it works on the Coal Consumption dataset (found in MonthlyCoalConsumption.xlsx), using a window size of 12. (Before, we used a window of 13 because odd-length windows are easier to work with than even-length ones. However, Python easily handles all cases automatically.) The key here is the function rolling()
that creates a DataFrame of rolling windows of specified length. Then the mean()
command computes the average of each window of data. Computing the first-order difference is just as easy, using the diff()
command.
Python Code
# Import libraries
import pandas as pd # for dataset management
import matplotlib.pyplot as plt # for visualizations
from matplotlib.ticker import MaxNLocator, FuncFormatter # for graph formatting and y-axis formatting
import matplotlib.dates as mdates # for date formatting
# Read the Excel file into a Pandas DataFrame
df = pd.read_excel('MonthlyCoalConsumption.xlsx')
# Convert 'Month' column to datetime format
df['Month'] = pd.to_datetime(df['Month'], format='%m/%d/%Y')
# Calculate a simple moving average with a window size of 12
df['SMA'] = df['Value'].rolling(window=12).mean()
# Calculate the first-order difference
df['Diff'] = df['Value'].diff()
# Function to format the Y-axis values
def y_format(value, tick_number):
return f'{int(round(value)):,}'
# Create the plot
plt.figure(figsize=(12, 6))
# Plot the time series, SMA, and difference
plt.plot(df['Month'], df['Value'], marker='', linestyle='-', color='b', label='Value')
plt.plot(df['Month'], df['SMA'], marker='', linestyle='-', color='r', label='SMA')
plt.plot(df['Month'], df['Diff'], marker='', linestyle='-', color='g', label='Diff')
# Set the x-axis limits to start from the first date and end at the maximum date in the data
plt.xlim(pd.to_datetime('01/01/2016', format='%m/%d/%Y'), df['Month'].max())
# Format the x-axis to show dates as MM/DD/YYYY and set the locator for a fixed number of ticks
ax = plt.gca()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
ax.xaxis.set_major_locator(MaxNLocator(nbins=12)) # Set the number of ticks to a reasonable value
# Manually set the rotation of date labels to 45 degrees
plt.xticks(rotation=45)
# Title and labels
plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
plt.xlabel('Month')
plt.legend()
# Apply the formatter to the Y-axis
ax.yaxis.set_major_formatter(FuncFormatter(y_format))
plt.tight_layout() # Adjust layout to ensure labels fit well
plt.show()
The resulting output will look like this:
Analyzing Seasonality
Once the time series has been detrended using one of the methods already discussed (or other, more sophisticated methods), what is left over should represent only the seasonality, or seasonal variation, , and error term, .
How could we analyze and model ? Suppose there were no error term at all and so we are looking at a time series with pure seasonal variation and no other factors. In this case, the component would look like a repeating pattern, such as the pattern shown in Figure 5.17. As mentioned in Components of Time Series Analysis, the time interval that it takes before the pattern completes once and begins to repeat is called the period. In a purely seasonal component, we would expect , where is the period, for all values of .
While it is easy to spot the repeating pattern in a time series that does not have noise, the same cannot be said for the general case. There are several tools available for isolating the seasonal component of a detrended time series.
Identifying the Period
First we need to identify the period of the seasonal data. The simplest method is to use prior knowledge about the time series data, along with a visual inspection. Often, a seasonal pattern is exactly that—a pattern that repeats over the course of one year. Thus, the period would be year (or 12 months, 52 weeks, 365 days, etc., depending on the time steps of your time series). For example, even without detrending the data, it was easy to identify a period of 4 quarters (1 year) in the data in Example 5.2.
If repetition of a seasonal pattern is not apparent from the data, then it may be discovered using standard statistical measures such as autocorrelation. Autocorrelation is the measure of correlation between a time series and the same time series shifted by some number of time steps. (Correlation and Linear Regression Analysis discussed correlation between two sets of data. Autocorrelation is the same idea applied to the same dataset with a shifted copy of itself.) The size of the shift is called the lag.
The autocorrelation function (ACF) measures the correlation at each lag starting with 0 or 1. Note, lag 0 always produces an autocorrelation of 1 since the time series is being compared to an exact copy of itself. Thus, we are only interested in the correlation coefficient at positive lags. However, small lags generally show high correlation with the original time series as well, simply because data measured within shorter time periods are generally similar to each other. So, even if the ACF shows relatively large correlation at lags of 1, 2, 3, etc., there may not be a true seasonal component present. The positive lag corresponding to the first significant peak, or autocorrelation value that is much larger than the previous values, may indicate a seasonal component with that period.
Later, we will see how to compute ACF of a time series using Python, but first let’s explore further topics regarding seasonal components.
Seasonally Adjusted Data
Once you have determined the period of the seasonal component, you can then extract from the time series. As usual, there are a number of different ways to proceed.
Let’s assume that you have detrended the time series so that what remains consists of a seasonal component and noise term:
Moreover, suppose that the period of is . Of course, with the noise term still present in the time series, we cannot expect that holds, but it should be the case that for every . In other words, the data should look approximately the same when shifted by time steps. Our job now is to isolate one complete period of data that models the seasonal variation throughout the time series. In other words, we want to produce an estimate that is exactly periodic (, for all ) and models the underlying seasonal component as closely as possible.
For any given index , we expect the data points to hover around the same value or, in statistical language, to have rather low variance (much lower than the time series as a whole, that is). So it seems reasonable to use the mean of this slice of data to define the value of . For simplicity, assume that the time series has exactly data points (if not, then the data can be padded in an appropriate way). Then the estimate for the seasonal component would be defined for by the formula
As usual, the formula looks more intimidating than the concept that it’s based upon. As previously mentioned, this represents nothing more than the average of all the values of the time series that occur every time steps.
Once is computed in the range , extend it by repeating the same values in each period until you reach the last index of the original series . (Note: The noise term may now be estimated as the series of residuals, , which will occupy our interest in Forecast Evaluation Methods.)
Finally, the estimate can then be used in conjunction with the original time series to smooth out the seasonal variation. Note that essentially measures the fluctuations of observed data above or below the trend. We expect the mean of to be close to zero in any window of time. Define the seasonally adjusted data as the time series
A seasonally adjusted time series is closely related to the underlying trend-cycle curve, , though generally it contains the noise and other random factors that does not generally include.
Example 5.6
Problem
The revenue data from Walt’s Water Adventures (see Table 5.2) seems to display seasonality of period 4. Using the level as the trend, detrend the data, compute the seasonal component , and find the seasonally adjusted time series.
Solution
First, we need to detrend the data. The level is the average of all 12 observations. Using a calculator or software, the level is found to be . Subtract to detrend the data (see Table 5.4). For example, the first term of the detrended data would be .
There are a total of 12 observations, so there will be periods of size . Using the values of in the formula for , we find:
Then repeat this block of 4 values as needed to obtain the entire seasonal component. Finally, the seasonally adjusted time series is the difference between the observed data and seasonal component (see Table 5.4).
k | Quarter | Revenues | Detrended |
Seas. Comp. | Seasonally Adj. |
---|---|---|---|---|---|
1 | 2021 Spring | 1799 | −476.42 | −13.08 | 1812.08 |
2 | 2021 Summer | 3304 | 1028.58 | 1261.92 | 2042.08 |
3 | 2021 Fall | 1279 | −996.42 | −484.08 | 1763.08 |
4 | 2021 Winter | 999 | −1276.42 | −764.75 | 1763.75 |
5 | 2022 Spring | 2284 | 8.58 | −13.08 | 2297.08 |
6 | 2022 Summer | 3459 | 1183.58 | 1261.92 | 2197.08 |
7 | 2022 Fall | 1964 | −311.42 | −484.08 | 2448.08 |
8 | 2022 Winter | 1594 | −681.42 | −764.75 | 2358.75 |
9 | 2023 Spring | 2704 | 428.58 | −13.08 | 2717.08 |
10 | 2023 Summer | 3849 | 1573.58 | 1261.92 | 2587.08 |
11 | 2023 Fall | 2131 | −144.42 | −484.08 | 2615.08 |
12 | 2023 Winter | 1939 | −336.42 | −764.75 | 2703.75 |
Figure 5.18 shows the data along with the seasonal component and seasonally adjusted data.
Decomposing a Time Series into Components Using Python
There is a powerful Python library known as statsmodels that collects many useful statistical functions together. We will use the STL
module to decompose the time series MonthlyCoalConsumption.xlsx into its components. First, we plot an ACF (autocorrelation function) to determine the best value for periodicity.
Python Code
# Import libraries
from statsmodels.graphics.tsaplots import plot_acf
# Read the Excel file into a Pandas Dataframe
df = pd.read_excel('MonthlyCoalConsumption.xlsx')
# Plot the ACF
plot_acf(df['Value'], lags=20, title='ACF');
The resulting output will look like this:
Points that are within the blue shaded area correspond to lags at which the autocorrelation is statistically insignificant. The lags that show the most significance are at 0, 1, 6, and 12. always gives 100% autocorrelation because we are comparing the unshifted series to itself. At , a high autocorrelation corresponds to the fact that consecutive data points are relatively close to each other (which is also to be expected in most datasets). Of special note are lags at 6 and 12, corresponding to a possible seasonal period of 6 or 12 in the data. We will use period 12 in our STL decomposition.
Python Code
# Import libraries
from statsmodels.tsa.seasonal import STL
import matplotlib.pyplot as plt ## for data visualization
# Perform seasonal decomposition using STL
stl = STL(df['Value'], period=12)
result = stl.fit()
# Plot components and format the y-axis labels
fig = result.plot()
for ax in fig.axes:
ax.yaxis.set_major_formatter(FuncFormatter(y_format))
plt.show()
The resulting output will look like this:
The trend, seasonal variation, and residual (noise) components are all graphed in separate plots. To see them all on the same set of axes, we can use matplotlib (noise omitted).
Python Code
import matplotlib.pyplot as plt ## for data visualization
# Extract the trend and seasonal components
trend = result.trend
seasonal = result.seasonal
# Plot the original data, trend, and seasonally adjusted data
plt.plot(df['Value'], label='Original Data')
plt.plot(trend, label='Trend')
plt.plot(seasonal, label='Seasonal')
plt.legend()
# Apply the formatter to the Y-axis
plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
plt.show()
The resulting output will look like this:
Weighted Moving Averages and Exponential Moving Averages
Simple moving averages were discussed in detail at the beginning of this section. Now we expand the discussion to include more sophisticated weighted moving averages and their uses in smoothing data and forecasting. Recall from our discussion in Time Series Forecasting Methods that weighted moving averages are moving averages in which terms are given weights according to some formula or rule.
Suppose that a time series has no significant seasonal component (or the seasonal component has been removed so that remaining series is seasonally adjusted). Thus, the series has only trend-cycle and noise components:
As already discussed, a simple moving average (SMA) has the effect of smoothing out the data and providing an easy method for predicting the next term of the sequence. For reference, here is the SMA prediction model, with a window of terms:
This model may be regarded as a weighted moving average model (WMA) in which the weights of the most recent terms are all equal to , while all other weights are set to 0. For example, in an SMA model with , the weights would be
Even the naïve model, , counts as a WMA, with weights 1, 0, 0, …
In principle, a weighted moving average model may be built using any coefficients, giving a general form for any WMA:
Of course, if we want the model to do a good job in forecasting new values of the time series, then the coefficients must be chosen wisely. Typically, the weights must be normalized, which means that they sum to 1.
However, there is no hard-and-fast requirement that the sum must be equal to 1.
Rather than giving the same weight to some number of previous terms as the SMA model does, you might choose to give greater weight to more recent terms. For example, one might use powers of as weights.
This method allows the most recent term to dominate the sum, and each successive term contributes half as much in turn. (Note: The sequence , , , would be a normalized sequence if infinitely many terms were included, but even a small finite number of terms, like 5 or so, adds up to a sum that is pretty close to 1.)
This is an example of an exponential moving average, or EMA. The most general formula for the EMA model is shown. (Note that the previous example corresponds to an EMA model with ).
Example 5.7
Problem
Use EMA models with (equivalently, 0.5) and for the S&P Index data from Table 5.1 to estimate the value of the S&P Index at the end of year 2023.
Solution
First, for , the model is:
Now plug in the data (remember to use the most recent data point first and work your way backward). Eight terms were used, as the terms beyond that point are rather close to 0. Table 5.2 records the results.
For , use the formula for weights,
, , , etc. This time, the coefficients decrease quicker than in the case , so only 5 terms are significant. Table 5.5 shows the results.
Year | S&P Index at Year-End | Exponentially Weighted | Exponentially Weighted |
---|---|---|---|
2013 | 1848.36 | ||
2014 | 2058.9 | ||
2015 | 2043.94 | ||
2016 | 2238.83 | ||
2017 | 2673.61 | ||
2018 | 2506.85 | ||
2019 | 3230.78 | ||
2020 | 3756.07 | ||
2021 | 4766.18 | ||
2022 | 3839.5 | ||
2023 | 4769.83 | ||
2024 |
Therefore, we find estimates of 4,349 and 4,541 for the S&P value at the end of 2024, by EMA models with and , respectively. Note that larger values of emphasize the most recent data, while smaller values allow more of the past data points to influence the average. If the time series trend experiences more variability (due to external factors that we have no control over), then an EMA with a relatively large -value would be more appropriate than one with a small -value.
EMA models may also serve as smoothing techniques, isolating the trend-cycle component. In this context, it’s better to use an equivalent form of the EMA model:
In this form, we build a series of estimates recursively by incorporating the terms of the original time series one at a time. The idea is that the next estimate, , is found by weighted average of the current data point, , and the current estimate, . Note, we may set since we need to begin the process somewhere. It can be shown mathematically that this procedure is essentially the same as working out the sum of exponentially weight terms directly and is much more efficient.
Example 5.8
Problem
Use the recursive EMA formula with to smooth the S&P Index time series (Table 5.1), and then use the EMA model to estimate the value of the S&P Index at the end of the 2024. Plot the EMA together with the original time series.
Solution
With , the recursive EMA formula becomes: . Table 5.6 shows the results of the computation.
Year | S&P Index at Year-End | EMA Estimate |
---|---|---|
2013 | ||
2014 | ||
2015 | ||
2016 | ||
2017 | ||
2018 | ||
2019 | ||
2020 | ||
2021 | ||
2022 | ||
2023 | ||
2024 | N/A |
According to this model, the S&P Index will be about 4577 at the end of 2024. Figure 5.19 shows the time series together with the exponential smoothing model.
Note, EMA smoothing can be done very easily in Excel. Simply type in the formulas =B2 into cell C2 and =0.75*B2+0.25*C2 into cell C3. Then drag cell C3 down to the end of the data to compute the EMA. You can also do EMA smoothing in Python using the ewm()
function applied to a dataset. Note, the “span” value should be set to , as shown in this code.
Python Code
# Import libraries
import pandas as pd ## for dataset management
# Creating a dictionary with the given data
data = {
'Year': [2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023],
'SP500': [1848.36, 2058.9, 2043.94, 2238.83, 2673.61, 2506.85, 3230.78, 3756.07, 4766.18, 3839.5, 4769.83]
}
# Creating DataFrame
df = pd.DataFrame(data)
alpha = 0.75
df['ema']=df['SP500'].ewm(span=2/alpha-1, adjust=False).mean()
print(df['ema'])
The resulting output will look like this:
0 1848.360000
1 2006.265000
2 2034.521250
3 2187.752812
4 2552.145703
5 2518.173926
6 3052.628481
7 3580.209620
8 4469.687405
9 3997.046851
10 4576.634213
Name: ema, dtype: float64
Autoregressive Integrated Moving Average (ARIMA)
In this section, we develop a very powerful tool that combines multiple models together and is able to capture trend and seasonality. The method is called autoregressive integrated moving average, or ARIMA for short, and it consists of three main components:
- Autoregressive (AR) model. This component captures patterns in the time series that are repeated, including seasonal components as well as correlation with recent values.
- Integrative (I) component. This component represents the differencing operation discussed in the section on detrending a time series.
- Moving average (MA) model. Despite the name, this component is not the same as the moving average discussed previously, although there is a mathematical connection. Instead of focusing on averages of existing data points, MA models incorporate the influence of past forecast errors to predict the next value of the time series.
Before getting into the specifics of each component of ARIMA, we need to talk about the term stationary, which refers to a time series in which the variance is relatively constant over time, an overall upward or downward trend cannot be found, and no seasonal patterns exist. White noise is an example of a stationary time series; however, the term stationary is a bit more general, as it includes the possibility of a cyclic component in the data. Recall, a cyclic component (despite its name) is a longer-term rise or fall in the values of a time series that may occur from time to time, but the occurrences are unpredictable.
The AR and MA models work best on stationary time series. The integrative component is the tool that can detrend the data, transforming a nonstationary time series into a stationary one so that AR and/or MA may be used.
Autoregressive Models
The autoregressive models take into account a fixed number of previous values of the time series in order to predict the next value. In this way, AR models are much like WMA models previously discussed, in the sense that the next term of the time series depends on a sum of weighted terms of previous values; however, in the AR model, the terms used are the terms of the model itself.
Suppose we want to model a time series, , using an AR model. First, we should pick a value , called the order of the model, which specifies how many previous terms to include in the model. Then we may call the model an model. If there is a strong seasonal component of period, then it makes sense to choose that value as the order.
An model is a recursive formula of the form
Here, the parameters or weights, , are constants that must be chosen so that the values of the time series fit the known data as closely as possible (i.e., for all known observations). There are also certain restrictions on the possible values of each weight c, but we do not need to delve into those specifics in this text as we will rely on software to do the work of finding the parameters for us.
Integrative Component
The “I” in ARIMA stands for integrative, which is closely related to the word integral in calculus. Now if you have seen some calculus, you might recall that the integral is just the opposite of a derivative. The derivative, in turn, is closely related to the idea of differencing in a time series. Of course, you don’t have to know calculus to understand the idea of an integrative model.
Rather than producing another component of the time series, the integrative component of ARIMA represents the procedure of taking differences, (up to some specific order ), until the time series becomes stationary. Often, only the first-order difference is needed, and you can check for stationarity visually. A more sophisticated method, called the Augmented Dickey-Fuller (ADF) test, sets up a test statistic that measures the presence of non-stationarity in a time series. Using ADF, a p-value is computed. If the p-value is below a set threshold (such as ), then the null hypothesis of non-stationarity can be rejected. Next, we will show how to run the ADF test on a time series and its differences to find out the best value for in ARIMA.
Then the AR and MA models can be produced on the resulting stationary time series. Integrating (which is the opposite of differencing) builds the model back up so that it can ultimately be used to predict values of the original time series .
Moving Average Models
An MA model of order , or , takes the form
Here, the terms are residuals (forecast errors), and is the level of the time series (mean value of all the terms). Again, there are a multitude of parameters that need to be chosen in order for the model to make sense. We will rely on software to handle this in general.
ARIMA Model in Python
The full ARIMA model, , incorporates all three components—autoregressive of order , integrative with d-order differencing, and moving average of order . For the time series MonthlyCoalConsumption.xlsx, it turns out that the second-order difference is stationary, as shown by a series of ADF tests.
Python Code
# Import libraries
import pandas as pd ## for dataset management
# Import adfuller from the statsmodels library
from statsmodels.tsa.stattools import adfuller
# Read data
df = pd.read_excel('MonthlyCoalConsumption.xlsx')
print('d = 0')
result = adfuller(df['Value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
## Get the first difference
print('\n d = 1')
df1 = df['Value'].diff().dropna()
result = adfuller(df1)
print('ADF Statistic:', result[0])
print('p-value:', result[1])
## Get the second difference
print('\n d = 2')
df2 = df1.diff().dropna()
result = adfuller(df2)
print('ADF Statistic:', result[0])
print('p-value:', result[1])
The resulting output will look like this:
d = 0
ADF Statistic: -1.174289795062662
p-value: 0.6845566772896323
d = 1
ADF Statistic: -1.7838215415905252
p-value: 0.38852291349101137
d = 2
ADF Statistic: -8.578323839498081
p-value: 7.852009937900099e-14
Here, a p-value of indicates that the second-order difference is stationary, so we will set in the ARIMA model. An ACF plot of the second difference can be used to determine .
Python Code
# Import libraries
from statsmodels.graphics.tsaplots import plot_acf
# Plot the ACF of the second difference
plot_acf(df2, lags=20, title='ACF for d = 2 Difference');
We see a peak at lag 12, corresponding to the expected yearly seasonal variation, so we will set in the ARIMA model. For this time series, positive values of did not change the accuracy of the model at all, so we set (MA model component unnecessary). The model was used to forecast 24 time steps (2 years) into the future. We are using the Python library statsmodels.tsa.arima.model to create the ARIMA model.
Python Code
# Import ARIMA model from statsmodels.tsa library
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt ## for plotting graphs
from matplotlib.ticker import MaxNLocator ## for graph formatting
from matplotlib.ticker import FuncFormatter ## for formatting y-axis
# Fit an ARIMA model
p = 12; d = 2; q = 0 # ARIMA component orders
model = ARIMA(df['Value'], order=(p, d, q))
results = model.fit()
# Make forecasts
fc = results.forecast(steps=24)
# Plot the results
plt.figure(figsize=(10, 6))
# Plot original time series
plt.plot(df['Value'], label='Original Time Series')
# Plot fitted and forecasted values
plt.plot(results.fittedvalues, color='red', label='Fitted Values')
plt.plot(fc, color='red', linestyle='dashed', label='Forecasted Values') ## Forecasted values
# Set labels and legend
plt.xlabel('Months')
plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
plt.legend()
# Function to format the Y-axis values
def y_format(value, tick_number):
return f'{value:,.0f}'
# Apply the formatter to the Y-axis
plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
plt.show()
The resulting output will look like this:
Here, the blue curve is the original time series, and the red curve represents the ARIMA model, with forecasted values indicated by the dashed part of the curve. Although there is considerable error near the beginning of the time period, the model seems to fit the known values of the data fairly well, and so we expect that the future forecasts are reasonable.