MAT 3379 INTRODUCTION TO TIME SERIES (WINTER 2021)

# Step by step data analysis

Northern Hemisphere.txt;
Then plot it together with ACF.
```Temperature=scan("NorthernHemisphere.txt");
par(mfrow=c(1,1))
plot.ts(Temperature);
par(mfrow=c(1,2))
acf(Temperature); pacf(Temperature)
```
The sequence is clearly not stationary. We need to remove trend.
2. To remove the trend, I will apply the exponential smoothing:
```par(mfrow=c(1,1))
plot.ts(Temperature);
MySmoothedTS=ExpSmooth(Temperature,0.1);
Stationary=Temperature-MySmoothedTS;
points(MySmoothedTS,col="blue",type="l");
```
For time series to analyse I will use
```TimeSeries=Stationary;
```
3. Which model?
```par(mfrow=c(1,2))
acf(TimeSeries);pacf(TimeSeries)
```
AR(4) looks like a good candidate.
4. Estimation using the Yule-Walker procedure.
```fit.ar.yw<-ar(TimeSeries,method="yule-walker");
fit.ar.yw
```
Output:
```Call:
ar(x = TimeSeries, method = "yule-walker")

Coefficients:
1        2        3        4
0.1745   0.1218  -0.0529   0.2855

Order selected 4  sigma^2 estimated as  0.03412
```
5. The estimated coefficients are:
```phi.yw=fit.ar.yw\$ar
```
6. Model diagnostic: Is AR(4) fit appropriate? Compute "Residuals":
```n=length(TimeSeries);
Residuals.yw<-fit.ar.yw\$resid;
Residuals.yw=na.omit(Residuals.yw);
par(mfrow=c(1,2));
acf(Residuals.yw); pacf(Residuals.yw);
```
There is no dependence left in residuals (although you can argue that there is a significant lag at 12 or 13). The fit is appropriate.
7. As the comparison, we apply the maximum likelihhod procedure. First, you need to check normality:
```par(mfrow=c(1,2))
qqnorm(TimeSeries); hist(TimeSeries);
```
You can apply the MLE.
```fit.ar.mle<-ar(TimeSeries,method="mle");
fit.ar.mle
```
Output:
```Call:
ar(x = TimeSeries, method = "mle")

Coefficients:
1        2        3        4        5
0.1427   0.1290  -0.0682   0.2716   0.1187

Order selected 5  sigma^2 estimated as  0.03241
```
The same output will be given by
```arima(TimeSeries,order=c(5,0,0),method="ML")
```
Note that MLE procedure selected the different order. There is nothing wrong with that!
8. Is the MLE fit appropriate?
```Residuals.mle=fit.ar.mle\$resid;
Residuals.mle=na.omit(Residuals.mle);
par(mfrow=c(1,2));
acf(Residuals.mle); pacf(Residuals.mle);
```
Yes! The fit is ok.
9. We have two competing models. Which one should we choose? For example, check the quality of the prediction:
```Squared.Error.yw=mean((Residuals.yw)^2);
Squared.Error.mle=mean((Residuals.mle)^2);
Squared.Error.yw; Squared.Error.mle;
```
MLE yields lower error.
10. Why AR(5) was selected? We calculate AIC.
```fit.mle.4<-arima(TimeSeries,order=c(4,0,0),method="ML");
fit.mle.5<-arima(TimeSeries,order=c(5,0,0),method="ML");
fit.mle.4; fit.mle.5;
```
Note the values of log-likelihod and AIC. To check that they make sense, we calculate
```n=length(TimeSeries);
log.lik.4=46.25;
log.lik.5=47.34;
p=4; q=0; k=1; term.4=2*(p+q+k+1);
p=5; q=0; k=1; term.5=2*(p+q+k+1);
AIC.4=-2*log.lik.4+term.4; AIC.4;
AIC.5=-2*log.lik.5+term.5; AIC.5
```
11. Use the model coming from the MLE procedure and predict the next 10 observations by
```k=20
```
12. We need to get back to the original data. Let's see first what happens if we ignore the trend. Ignoring trend
```	n=length(Temperature);k=20;
prediction.1=prediction+Temperature[n];
prediction.1.upper=prediction.1+error;
prediction.1.lower=prediction.1-error;
dummy.ts=c(rep(NA,k));
NewTemperature=c(Temperature,dummy.ts);
dummy.pred=c(rep(NA,n));
PredictedTimeSeries=c(dummy.pred,prediction.1);
PredictionUpperLimit=c(dummy.pred,prediction.1.upper);
PredictionLowerLimit=c(dummy.pred,prediction.1.lower);
par(mfrow=c(1,1))
plot.ts(NewTemperature,ylim=c(-1,2),main="Ignoring Trend");
points(PredictedTimeSeries,col="red",type="p");
points(PredictionUpperLimit,col="green",type="l");
points(PredictionLowerLimit,col="green",type="l");
```
We can see that starting year 120, we have more or less linear trend. Thus, I take this part of the data and will fit a linear trend there.
```	n=length(Temperature);
Time=seq(1,n,by=1);
lin.reg=lm(Temperature[120:n]~Time[120:n]);
Lin.Trend=0.03065*Time-3.96023;
plot.ts(Temperature);
points(Lin.Trend,col="blue",type="l")
```
Then, I extend the linear trend and add prediction:
```	k=20;
dummy.ts=c(rep(NA,k));
NewTemperature=c(Temperature,dummy.ts);
dummy.trend=c(rep(NA,n));
Time=seq(1,n+k,by=1);
Extended.Trend=0.03065*Time-3.96023;
Trend=c(dummy.trend,Extended.Trend[(n+1):(n+k)]);
y.max=2;
y.min=min(Temperature);
par(mfrow=c(1,2))
plot.ts(Temperature);
points(Lin.Trend,col="blue",type="l");
plot.ts(NewTemperature,ylim=c(y.min,y.max));
points(Trend,col="blue",type="l");
Prediction.stationary=c(dummy.trend,prediction);
PredictedTimeSeries=Trend+Prediction.stationary
points(PredictedTimeSeries,col="red",type="p")
```