MAT 3379
INTRODUCTION TO TIME SERIES (WINTER 2021)

R Examples Part 5

Step by step data analysis

  1. Download NorthernHemisphere.txt and get it into R.
    Northern Hemisphere.txt;
    Then plot it together with ACF.
    Temperature=scan("NorthernHemisphere.txt");
    par(mfrow=c(1,1))
    plot.ts(Temperature); 
    par(mfrow=c(1,2))
    acf(Temperature); pacf(Temperature)
    
    The sequence is clearly not stationary. We need to remove trend.
  2. To remove the trend, I will apply the exponential smoothing:
    par(mfrow=c(1,1))
    plot.ts(Temperature);
    MySmoothedTS=ExpSmooth(Temperature,0.1);
    Stationary=Temperature-MySmoothedTS;
    points(MySmoothedTS,col="blue",type="l");
    
    For time series to analyse I will use
    TimeSeries=Stationary;
    
  3. Which model?
    par(mfrow=c(1,2))
    acf(TimeSeries);pacf(TimeSeries)
    
    AR(4) looks like a good candidate.
  4. Estimation using the Yule-Walker procedure.
    fit.ar.yw<-ar(TimeSeries,method="yule-walker");
    fit.ar.yw
    
    Output:
    Call:
    ar(x = TimeSeries, method = "yule-walker")
    
    Coefficients:
          1        2        3        4  
     0.1745   0.1218  -0.0529   0.2855  
    
    Order selected 4  sigma^2 estimated as  0.03412
    
  5. The estimated coefficients are:
    phi.yw=fit.ar.yw$ar
    
  6. Model diagnostic: Is AR(4) fit appropriate? Compute "Residuals":
    n=length(TimeSeries);
    Residuals.yw<-fit.ar.yw$resid;
    Residuals.yw=na.omit(Residuals.yw);
    par(mfrow=c(1,2));
    acf(Residuals.yw); pacf(Residuals.yw);
    
    There is no dependence left in residuals (although you can argue that there is a significant lag at 12 or 13). The fit is appropriate.
  7. As the comparison, we apply the maximum likelihhod procedure. First, you need to check normality:
    par(mfrow=c(1,2))
    qqnorm(TimeSeries); hist(TimeSeries);
    
    You can apply the MLE.
    fit.ar.mle<-ar(TimeSeries,method="mle");
    fit.ar.mle
    
    Output:
    Call:
    ar(x = TimeSeries, method = "mle")
    
    Coefficients:
          1        2        3        4        5  
     0.1427   0.1290  -0.0682   0.2716   0.1187  
    
    Order selected 5  sigma^2 estimated as  0.03241
    
    The same output will be given by
    arima(TimeSeries,order=c(5,0,0),method="ML")
    
    Note that MLE procedure selected the different order. There is nothing wrong with that!
  8. Is the MLE fit appropriate?
    Residuals.mle=fit.ar.mle$resid;
    Residuals.mle=na.omit(Residuals.mle);
    par(mfrow=c(1,2));
    acf(Residuals.mle); pacf(Residuals.mle);
    
    Yes! The fit is ok.
  9. We have two competing models. Which one should we choose? For example, check the quality of the prediction:
    Squared.Error.yw=mean((Residuals.yw)^2);
    Squared.Error.mle=mean((Residuals.mle)^2);
    Squared.Error.yw; Squared.Error.mle;
    
    MLE yields lower error.
  10. Why AR(5) was selected? We calculate AIC.
    fit.mle.4<-arima(TimeSeries,order=c(4,0,0),method="ML");
    fit.mle.5<-arima(TimeSeries,order=c(5,0,0),method="ML");
    fit.mle.4; fit.mle.5;
    
    Note the values of log-likelihod and AIC. To check that they make sense, we calculate
    n=length(TimeSeries);
    log.lik.4=46.25;
    log.lik.5=47.34;
    p=4; q=0; k=1; term.4=2*(p+q+k+1);
    p=5; q=0; k=1; term.5=2*(p+q+k+1);
    AIC.4=-2*log.lik.4+term.4; AIC.4;
    AIC.5=-2*log.lik.5+term.5; AIC.5
    
  11. Use the model coming from the MLE procedure and predict the next 10 observations by
    k=20
    prediction=predict(fit.ar.mle,n.ahead=k)$pred;
    error=predict(fit.ar.mle,n.ahead=k)$se
    
  12. We need to get back to the original data. Let's see first what happens if we ignore the trend. Ignoring trend
    	n=length(Temperature);k=20;
    	prediction.1=prediction+Temperature[n];
    	prediction.1.upper=prediction.1+error;
    	prediction.1.lower=prediction.1-error;
    	dummy.ts=c(rep(NA,k));
    	NewTemperature=c(Temperature,dummy.ts);
    	dummy.pred=c(rep(NA,n));
    	PredictedTimeSeries=c(dummy.pred,prediction.1);
    	PredictionUpperLimit=c(dummy.pred,prediction.1.upper);
    	PredictionLowerLimit=c(dummy.pred,prediction.1.lower);
    	par(mfrow=c(1,1))
    	plot.ts(NewTemperature,ylim=c(-1,2),main="Ignoring Trend");
    	points(PredictedTimeSeries,col="red",type="p");
    	points(PredictionUpperLimit,col="green",type="l");
    	points(PredictionLowerLimit,col="green",type="l");
    
    We can see that starting year 120, we have more or less linear trend. Thus, I take this part of the data and will fit a linear trend there.
    	n=length(Temperature);
    	Time=seq(1,n,by=1);
    	lin.reg=lm(Temperature[120:n]~Time[120:n]);
    	Lin.Trend=0.03065*Time-3.96023;
    	plot.ts(Temperature);
    	points(Lin.Trend,col="blue",type="l")
    
    Then, I extend the linear trend and add prediction:
    	k=20;
    	dummy.ts=c(rep(NA,k));
    	NewTemperature=c(Temperature,dummy.ts);
    	dummy.trend=c(rep(NA,n));
    	Time=seq(1,n+k,by=1);
    	Extended.Trend=0.03065*Time-3.96023;
    	Trend=c(dummy.trend,Extended.Trend[(n+1):(n+k)]);
    	y.max=2;
    	y.min=min(Temperature);
    	par(mfrow=c(1,2))
    	plot.ts(Temperature);
    	points(Lin.Trend,col="blue",type="l");
    	plot.ts(NewTemperature,ylim=c(y.min,y.max));
    	points(Trend,col="blue",type="l");
    	Prediction.stationary=c(dummy.trend,prediction);
    	PredictedTimeSeries=Trend+Prediction.stationary
    	points(PredictedTimeSeries,col="red",type="p")