Concursos
Forecast evaluation procedures When Dealing with Time Series
Forecast evaluation is a critical step in assessing the accuracy and effectiveness of forecasting models. It helps determine how well a model is performing and whether it can reliably predict future outcomes. Below are several commonly used forecast evaluation procedures:
1. Visual Inspection
- Plotting Forecast vs. Actuals: Plotting the forecasted values against the actual values is a simple way to assess how well the model fits the data. If the forecast closely matches the actual values, the model is likely performing well.
- Residuals Plot: The residuals (forecast error) should resemble white noise — that is, they should be randomly scattered around zero without any discernible patterns. A clear pattern in the residuals suggests that the model may not be capturing some underlying structure in the data.
2. Error Metrics
These are quantitative methods for assessing the accuracy of a forecast.
- Mean Absolute Error (MAE):
[
MAE = \frac{1}{n} \sum_{t=1}^{n} |y_t – \hat{y}_t|
]The average of the absolute differences between actual and forecasted values. It provides a simple measure of forecast accuracy. - Mean Squared Error (MSE):
[
MSE = \frac{1}{n} \sum_{t=1}^{n} (y_t – \hat{y}_t)^2
]The average of the squared differences between actual and forecasted values. MSE penalizes larger errors more than smaller ones. - Root Mean Squared Error (RMSE):
[
RMSE = \sqrt{MSE}
]The square root of MSE, which brings the error measure back to the same unit as the original data. It’s more interpretable when you need to understand the size of the forecast errors in the original units. - Mean Absolute Percentage Error (MAPE):
[
MAPE = \frac{1}{n} \sum_{t=1}^{n} \left| \frac{y_t – \hat{y}_t}{y_t} \right| \times 100
]The average of the absolute percentage errors. This metric is useful when you want to evaluate the relative error regardless of scale. - Symmetric Mean Absolute Percentage Error (sMAPE):
[
sMAPE = \frac{1}{n} \sum_{t=1}^{n} \frac{|y_t – \hat{y}_t|}{( |y_t| + |\hat{y}_t| ) / 2} \times 100
]This version of MAPE avoids some of its pitfalls when actual values are close to zero. - Theil’s U-statistic:
A measure that compares the performance of the forecast model to a naive forecasting model (e.g., using the previous period’s value as the forecast). A value of U-statistic below 1 indicates that the model is outperforming the naive model.
3. Cross-validation
- Time-Series Cross-Validation: In time series, traditional k-fold cross-validation (used in standard machine learning) isn’t appropriate because the data has temporal dependencies. Instead, time-series cross-validation is used, where the data is split into training sets and testing sets along the time dimension.
- Rolling Forecast Origin: One common approach is to use a rolling forecast origin, where the training set is extended with each step while the test set remains fixed.
4. Out-of-Sample Forecast Evaluation
- Holdout Set: Reserve a portion of the data for testing (e.g., the last 20% of observations). This set is never used in training the model, and it’s used to test the model’s forecast accuracy.
- Walk-Forward Validation: For each forecast, the training dataset is updated with the new observations as time progresses. Each test set is used only once, and the model is re-evaluated iteratively.
5. Bias Evaluation
- Mean Bias Deviation (MBD):
[
MBD = \frac{1}{n} \sum_{t=1}^{n} (y_t – \hat{y}_t)
]This statistic evaluates the bias of the model — whether the forecasts tend to overestimate or underestimate the actual values. A negative MBD suggests underestimation, while a positive MBD suggests overestimation. - Tracking Signal:
This measures whether the forecast is biased over time. It’s the cumulative sum of the forecast errors divided by the mean absolute deviation (MAD). A value outside of the acceptable range (e.g., ±4) suggests a significant bias.
6. Model Comparison
- Diebold-Mariano Test: This is a statistical test used to compare the accuracy of two competing forecasting models. It tests whether the forecast errors of one model are statistically different from those of another model.
7. Confidence Intervals for Forecasts
- It’s important to assess not only the point forecasts but also the uncertainty around them. Confidence intervals provide a range within which the true value is expected to lie, given the uncertainty in the forecast.
- A wider interval typically indicates greater uncertainty in the forecast.
8. Forecasting Accuracy Over Time
- Evaluate how forecast performance changes over time. For example, if forecasts tend to perform worse in certain periods (e.g., during periods of high volatility), it can indicate that the model needs improvement or adjustment for specific circumstances.
Conclusion
The process of forecast evaluation involves multiple steps to assess both the accuracy and reliability of the forecast. Depending on the context, you may need to use a combination of visual inspections, error metrics, cross-validation, and hypothesis testing to determine the best model for forecasting.