Learning Outcomes
By the end of this section, you should be able to:
- 5.4.1 Explain the nature of error in forecasting a time series.
- 5.4.2 Compute common error measures for time series models.
- 5.4.3 Produce prediction intervals in a forecasting example.
A time series forecast essentially predicts the middle of the road values for future terms of the series. For the purposes of prediction, the model considers only the trend and any cyclical or seasonal variation that could be detected within the known data. Although noise and random variation certainly do influence future values of the time series, their precise effects cannot be predicted (by their very nature). Thus, a forecasted value is the best guess of the model in the absence of an error term. On the other hand, error does affect how certain we can be of the model’s forecasts. In this section, we focus on quantifying the error in forecasting using statistical tools such as prediction intervals.
Forecasting Error
Suppose that you have a model for a given time series . Recall from Components of Time Series Analysis that the residuals quantify the error between the time series the model and serve as an estimate of the noise or random variation.
The better the fit of the model to the observed data, the smaller the values of will be. However, even very good models can turn out to be poor predictors of future values of the time series. Ironically, the better a model does at fitting to known data, the worse it may be at predicting future data. There is a danger of overfitting. (This term is defined in detail in Decision-Making Using Machine Learning Basics, but for now we do not need to go deeply into this topic.) A common technique to avoid overfitting is to build the model using only a portion of the known data, and then the model’s accuracy can be tested on the remaining data that was held out.
To discuss the accuracy of a model, we should ask the complementary question: How far away are the predictions from the true values? In other words, we should try to quantify the total error. There are several measures of error, or measures of fit, the metrics used to assess how well a model's predictions align with the observed data. We will only discuss a handful of them here that are most useful for time series. In each formula, refers to the term of the time series, and is the error between the actual term and the predicted term of the series.
- Mean absolute error (MAE): . A measure of the average magnitude of errors.
- Root mean squared error (RMSE): . A measure of the standard deviation of errors, penalizing larger errors more heavily than MAE does.
- Mean absolute percentage error (MAPE): . A measure of the average relative errors—that is, a percentage between the predicted values and the actual values (on average).
- Symmetric mean absolute percentage error (sMAPE): . Similar to MAPE, a measure of the average relative errors, but scaled so that errors are measured in relation to both actual values and predicted values.
Of these error measures, MAE and RMSE are scale-dependent, meaning that the error is in direct proportion to the data itself. In other words, if all terms of the time series and the model were scaled by a factor of , then the MAE and RMSE would both be multiplied by as well. On the other hand, MAPE and sMAPE are not scale-dependent. These measures of errors are often expressed as percentages (by multiplying the result of the formula by 100%). However, neither MAPE nor sMAPE should be used for data that is measured on a scale containing 0 and negative numbers. For example, it would not be wise to use MAPE or sMAPE as a measure of error for a time series model of Celsius temperature readings. For all of these measures of error, lower values indicate less error and hence more accuracy of the model. This is useful when comparing two or more models.
Example 5.9
Problem
Compute the MAE, RMSE, MAPE, and sMAPE for the EMA smoothing model for the S&P Index time series, shown in Table 5.3.
Solution
The first step is to find the residuals, , and take their absolute values. For RMSE, we will need the squared residuals, included in Table 5.7. We also include the terms and , which are used for computing MAPE and sMAPE, respectively. Notice all formulas are an average of some kind, but RMSE requires an additional step of taking a square root.
Year | S&P Index at Year-End | EMA Estimate | Abs. Residuals | |||
---|---|---|---|---|---|---|
2013 | 1848.36 | 1848.36 | 0 | 0 | 0 | 0 |
2014 | 2058.9 | 1848.36 | 210.54 | 44327.09 | 0.102 | 0.108 |
2015 | 2043.94 | 2006.27 | 37.67 | 1419.03 | 0.018 | 0.019 |
2016 | 2238.83 | 2034.52 | 204.31 | 41742.58 | 0.091 | 0.096 |
2017 | 2673.61 | 2187.75 | 485.86 | 236059.94 | 0.182 | 0.2 |
2018 | 2506.85 | 2552.15 | 45.3 | 2052.09 | 0.018 | 0.018 |
2019 | 3230.78 | 2518.17 | 712.61 | 507813.01 | 0.221 | 0.248 |
2020 | 3756.07 | 3052.63 | 703.44 | 494827.83 | 0.187 | 0.207 |
2021 | 4766.18 | 3580.21 | 1185.97 | 1406524.84 | 0.249 | 0.284 |
2022 | 3839.5 | 4469.69 | 630.19 | 397139.44 | 0.164 | 0.152 |
2023 | 4769.83 | 3997.05 | 772.78 | 597188.93 | 0.162 | 0.176 |
N/A | N/A | Average: | 453.52 | 339008.62 | 0.127 | 0.137 |
. (Predicted values are [on average] about 450 units away from true values.)
. (The errors between predicted and average values have a standard deviation of about 580 from the ideal error of 0.)
. (Predicted values are [on average] within about 13% of their true values.)
. (When looking at both predicted values and true values, their differences are on average about 14% from one another.)
These results may be used to compare the EMA estimate to some other estimation/prediction method to find out which one is more accurate. The lower the errors, the more accurate the method.
Prediction Intervals
A forecast is often accompanied by a prediction interval giving a range of values the variable could take with some level of probability. For example, if the prediction interval of a forecast is indicated to be 80%, then the prediction interval contains a range of values that should include the actual future value with a probability of 0.8. Prediction intervals were introduced in Analysis of Variance (ANOVA) in the context of linear regression, so we won’t get into the details of the mathematics here. The key point is that we want a measure of margin of error, (depending on ), such that future observations of the data will be within the interval with probability , where is a chosen level of confidence.
Here, we will demonstrate how to use Python to obtain prediction intervals.
The Python library statsmodels.tsa.arima.model
contains functions for finding confidence intervals as well. Note that the command get_forecast()
is used here rather than forecast()
, as the former contains more functionality.
Python Code
### Please run all code from previous section before running this ###
# Set alpha to 0.2 for 80% confidence interval
forecast_steps = 24
forecast_results = results.get_forecast(steps=forecast_steps, alpha=0.2)
# Extract forecast values and confidence intervals
forecast_values = forecast_results.predicted_mean
confidence_intervals = forecast_results.conf_int()
# Plot the results
plt.figure(figsize=(10, 6))
# Plot original time series
plt.plot(df['Value'], label='Original Time Series')
# Plot fitted values
plt.plot(results.fittedvalues, color='red', label='Fitted Values')
# Plot forecasted values with confidence intervals
plt.plot(forecast_values, color='red', linestyle='dashed', label='Forecasted Values')
plt.fill_between(
range(len(df), len(df) + forecast_steps),
confidence_intervals.iloc[:, 0],
confidence_intervals.iloc[:, 1],
color='red', alpha=0.2,
label='80% Confidence Interval'
)
# Set labels and legend
plt.xlabel('Months')
plt.title('Monthly Consumption of Coal for Electricity Generation in the United States from 2016 to 2022')
plt.legend()
# Apply the formatter to the Y-axis
plt.gca().yaxis.set_major_formatter(FuncFormatter(y_format))
plt.show()
The resulting output will look like this:
The forecast data (dashed curve) is now surrounded by a shaded region. With 80% probability, all future observations should fit into the shaded region. Of course, the further into the future we try to go, the more uncertain our forecasts will be, which is indicated by the wider and wider confidence interval region.
Datasets
Note: The primary datasets referenced in the chapter code may also be downloaded here.