top of page

(ARIMA models) Which model best fits in the “Air Travel.”


Introduction and Business Understanding

We start by using the file “Sept11Travel”, which contains the data of three different monthly time series between January 1990 and May 2004. This file contains 172 rows and 32 rows which are excluded from the data table. In the research, we should focus only on “Air Travel.” Our goal is to select the model that fits best, develop regression and ARIMA models.

Data Understanding

Using Graph Builder, we can tell how the times series looks (AIR RPM vs. Month) and Smoother them in and out, making a graph linear or heavy volatility.

In our first model, we want to look at the Regression model. Before doing it, we must add two columns to the data table to account for the model for seasonality. The first column is just a Counting table. To count the rows called - “Month_NUM,” we will use it as one variable in the regression model. The second column would be season – 12 months with a nominal, character data type. It creates 12 dummy variables for us. Finally, we are ready to make models and perform analyses of them.


Firstly, we have to make a Regression Forecast model by using two variables that we have created. Appendix A. Season [12] is a baseline value, measures all the differences among the 12. That is the exact reason we don’t see it in the Parameter Estimates. Most seasonal coefficients are statistically significant at the 0.05 level, except for months [5] and [10]. R square = 0.97, R square adj. = 0.96, we will be using this as well in the future to figure out which model performs better. After creating a prediction formula from the Regression Forecast model, we plot this data to an original data plot. Appendix B. If we decide to unexcluded the 141 rows, we will get the actual forecast. The blue line is the smooth representation of the actual values, and the red line is a forecast from the regression model. Appendix C. We can see that after September 11, the regression model is quite off because it is assumed that the trend is continuous and because of the variable we have created “Month_Num and it is a straight regression line, so it cannot know that after September 11 is a decline. Know we want to compare this model to the ARIMA model. We are excluding back these 141 Rows so that the ARIMA model won’t use this data.

We select Air RPM as a Y, Time series, and Month as an X, Time in time series. Next, we see the data plot in Appendix D., which contains Autocorrelation functions on the left side and Partial-Autocorrelation tasks on the right side. Bars outside the blue confidence limits for different lags indicate the values p and q.

Partial-Autocorrelation function: tells us AR(p) process, while Autocorrelation function gives us an indication of the average moving technique, MA(q).

Partial-Autocorrelation function, AR(p) process of 1, since the partial autocorrelation function is that close to 1, it is closed to be stationary and in that case, we getting a better estimate if we take the first differences and just model the residuals. When Autocorrelation is very close to 1, we take the first differences to make this data stationary. The same with the seasons, season [12], Autocorrelation function rather large partial-autocorrelation, and significant Partial Autocorrelation for the same month. Therefore, we use the standard method of season differences, and then we try to model the remaining Autocorrelation. When Autocorrelation is close to 1, the estimates become very unstable. That is a reason we are taking differences.

We selected a Seasonal ARIMA process because the Partial-Autocorrelation function at lag one is relatively large. We take the first difference and also because the Partial-Autocorrelation part is around the season is large at Lag [12]. ARIMA of 1 between successful months like July and August. Residuals of ARIMA process Appendix E.

The residuals of the season difference and the first auto differences much teamer know can fit a model. There has to be a season component in the ARIMA process with MA [12], also AR [1], or AM [1]. We want to try these models first and simplify the understanding of them. We make a Seasonal ARIMA with values, 1,1,1 and 1,1,1 and compare it to the original 0,1,0 and 0,1,0. R adj. square increased, which means we have very useful valuables in the set. Appendix F. AR1,1 is statistically significant, so as MA1,1, but AR2,12 and MA2,12 are not. Since the AR2,12 is very close to being statistically significant, we exclude it and look closely at MA2,12. The next step is to try models 1,1,1, and 0,1,1 by excluding the Autoregressive model. As we can see, we are getting a better result in Appendix G. Our residuals are looking very good within limits, and all estimates are statistically significant, and R adj. square also improved to 0.979. Right now, we want to try to use a different model and see if we can improve our forecast results.

We have tried model (1,1,1) and (0,0,0) not good results, so our next try is model (1,1,1) and (1,1,0) Appendix H.;

by excluding Moving Average Order, we have a greatly improved result. R adj. square has increased by little, so as Residuals got closer to our limitations at Lag [24]. By plotting these models into JMP by using our prediction formula, we can see that up to a month of September 11, both forecast for regression and ARIMA forecast quite similar and comparable, but know when we include back these 141 rows, the red curve which is a regression line is continuous up. However, the green and purple lines quickly follow the changing pattern close to the blue line. The deviation was a lot less than the deviation of the regression model, less Autocorrelation. Appendix I.


2 ARIMA models and Regression Forecast models were used in this assignment to select the model that fits the best to “Air Travel.” ARIMA deviation is a lot less than the deviation of the regression line. R adj. square performed the best in the ARIMA model (1,1,1) and (1,1,0) but pretty close to (1,1,1) and (0,1,1).

2 ARIMA MODELS, which statistically significant parameters a high adj. R square and residuals that had low Autocorrelation.

Overall, up to September 11, both forecast for regression and ARIMA forecast are quite similar and comparable. ARIMA produced more accurate results than ordinary regression estimation because of the account for dependencies between observations. The model replicated well.

Research Provided by Andrey Fateev


Appendix A. Parameter Estimates

Appendix B. Data from Reg. Model combined Original Data Plot.

Appendix C. Forecast Unexcluded 141 rows

Appendix D. Time Series of Air RPM, Basic Diagnostic.

Appendix E. Residuals of ARIMA process Appendix E.

Appendix F. Seasonal Component, Parameter Estimates for 1,1,1 and 1,1,1

Appendix G. Model (1,1,1 and 0,1,1)

Appendix H. (1,1,1) and (1,1,0)

Appendix I. 2 ARIMA models and Regression Model



bottom of page