6 Estimation and Results
The model was estimated using hourly data over the 1 Jan 1985 - 31 Dec
2015 time interval. The analysis was conducted in two distinct stages.
In the first stage, the functional form given by Eq. (1) was evaluated.
A nonlinear functional form was subsequentially identified.
The analysis also recognizes that the
disturbance term’s variance in a regression equation is heteroskedastic
instead of homoscedastic, i.e., variable instead of constant over time.
As suggested in the previous section, the accepted approach involves
estimating an ARCH model. This approach was proposed by Engle (1982) to
improve the analysis of financial data. It has since proven itself
invaluable in modeling any time-series variable in which there are
periods of turbulence followed by relative calm at some point. Hourly
temperature is one of those variables. Those tempted to claim otherwise
are cheerfully invited to consult the book entitled “Environmental
Econometrics Using Stata,” authored by Baum and Hurn (2021).
The second estimation stage also recognizes that the temperature in hour
t is not statistically independent from the temperature outcomes in
previous hours, as seen in Figure 9. As suggested in the previous
section, this is done using an ARMAX specification. In this case, the
transformed explanatory variables from the first stage (e.g.,
Solart1/4) are the exogenous inputs.
Given this specification, the disturbance terms are presumed to follow
an ARMA specification that models the autocorrelations reported in
Figure 9. The ARMA specification applied in this paper is not
parsimonious because the autocorrelative process in Figure 9 is not
short in duration. It is recognized that this approach runs counter to
the traditional time-series
philosophy (Box and Jenkins, 1976, p. 17), which suspected that there
was more room for prediction errors when more time-series parameters
were estimated (Hamilton, 1994, p. 106). The view here is that the goal
of predictive accuracy can sometimes be enhanced by including more ARMA
terms. This approach makes sense given the long memory property of the
autocorrelations evidenced in Figure 9 and the high level of variability
in temperature, as evidenced by Figure 5. The heteroskedasticity is
modeled as a function of the solar zenith angle, the hour of the day,
the day of the year, the year of the sample, and the following
variables: \(\sqrt{{CO2}_{t-1}},\sqrt{\text{Solar}_{t}}\ \). Instead
of assuming that hourly temperature is independent of the conditional
variance, the model permits the data to speak for itself on this issue.
This linkage is relevant if the level of a variable depends on the
variance in the disturbance term. The ARCH-in-mean model introduced by
Engel et al. (1987) offers an approach to estimate this linkage.
The possible merits of representing
the explanatory variables using a nonlinear specification are addressed
using the multivariable fractional polynomial (MFP) methodology (Royston
and Sauerbrei, 2008). Its application includes Forbes and St Cyr (2017,
2019) and Forbes and Zampelli(2019, 2020). The methodology considers the
effects of nonlinear transformations of the explanatory variables. In
the present case, the MFP results suggest the following specification:
lnTempt = \(\alpha_{0}^{{}^{\prime}}\)+\(\text{\ \ }\alpha_{1}^{{}^{\prime}}\) ZeroSolart +\(\alpha_{2}^{{}^{\prime}}\)Solart1/4 +\(\alpha_{3}^{{}^{\prime}}\)(CO2t-1*ZeroSolart)3
+ \(\alpha_{4}^{{}^{\prime}}\)(CO2t-1*PosSolart)1/4+ \(\alpha_{5}^{{}^{\prime}}\) (Solart * CO2t-1)1/4 +\(\sum_{h=1}^{9}{\beta_{h}^{{}^{\prime}}\text{Angle}_{\text{h\ }}}\)
+\(\sum_{i=2}^{24}{\text{\ \ \ }\phi_{i}^{{}^{\prime}}\text{\ HourofDay}}_{\text{i\ }}\)+\(\sum_{j=2}^{365}{\text{\ \ }\gamma_{i}^{{}^{\prime}}\text{\ DOY}}_{\text{j\ }}\)+ \(\sum_{k=1985}^{2014}{\text{\ \ }\delta_{k}^{{}^{\prime}}\text{\ Yea}r_{k}}\)(2)
Please note that\(\alpha_{1}^{{}^{\prime}}\), \(\alpha_{2}^{{}^{\prime}}\), and \(\alpha_{3}^{{}^{\prime}}\) etc. are
the estimated coefficients in this specification. Least squares
estimation of (2) produces a seemingly respectable level of explanatory
power, the R2 being about 0.831. However, a
Portmanteau test for autocorrelation (Box and Pierce, 1970; Ljung and
Box, 1978) reveals that the residuals are highly autocorrelated.
Consistent with Forbes and St. Cyr (2019, p.17), for lags one through
100, the P values are less than 0.0001. The null hypothesis of no
ARCH effects is rejected with a P- value less than 0.0001.
Consistent with these issues, the least-squares model is not useful.
This finding is supported by out-of-sample predictions over the period 1
Jan 2016 - 31 Aug 2017 time interval that have a root-mean-squared-error
(RMSE) of about 5.67 o C, a value that is clearly
indicative of a suboptimal prediction process.
ARCH/ARMAX methods can generate predictions that are much more accurate
than the predictions from a least-squares model when the dependent
variable is autoregressive and heteroskedastic in nature. In this case,
the ARCH process’s modeled lag lengths are lags 1 and 2. Consideration
was given to including additional ARCH terms to model the apparent
diurnal pattern of the ARCH process (e.g., 24, 48, 72, 96 etc.).
Consideration was also given to employing alternative ARCH and GARCH
specifications. These approaches were abandoned due to model convergence
issues. The modeled lag lengths for the AR process are 1 through 12, 23,
24, 25, 26, 47, 48, 49, 71, 72, 73, 96, 97, 120, 121, 144, 145 167,
168,169, 192, 193, 216, 240, 264, 288, 312, 335, 336, 337, 360, 384,
408, 432, 456, 480, 600, 671, 672, 673, 840, and 960. The MA modeled lag
lengths are 1 through 25, 48, 49, 71, 72, 73, 96, 97, 120, 121, 144,
145, 167, 168, 169, 192, 193, 216, 240, 264, 288, 312, 335, 336, 337,
360, 384, 408, 432, 456, 480, 600, 671, 672, 673, 840, and 960.
Equation (2) was estimated assuming that the residual error terms
correspond to the Student t distribution instead of the more typical
Gaussian distribution. This approach is believed to be justified by the
highly volatile nature of the weather system in the vicinity of Barrow.
One shortcoming in its application here is that the “degrees of
freedom” parameter is less than the minimum indicated by Harvey (2013,
p. 20). Consideration was given to modeling the residual error terms
using the generalized error distribution, but this approach was
abandoned due to model convergence issues.
Selected estimates are reported in Table 1. It is revealed that\(\alpha_{2\ \ }^{{}^{\prime}}\), the coefficient corresponding to
Solart1/4 is positive and highly
statistically significant. The CO2 coefficients\(\alpha_{3\ \ \ \ \ }^{{}^{\prime}}\)and \(\alpha_{4\ }^{{}^{\prime}}\) are also positive
and highly statistically significant while \(\alpha_{5}^{{}^{\prime}\ \ }\) is
negative and highly statistically significant. These findings are
consistent with the view that CO2 concentrations have
implications for hourly temperature but do not address the magnitude.
Concerning the possible non-anthropomorphic drivers of temperature, it
is interesting to note that 16 of the 30 variables in question are
statistically significant. With 2015 being represented in the constant
term, negative values for a year are consistent with higher predicted
temperatures in 2015 than in the year in question. There are 13 such
cases. For these cases, the coefficients’ median value is -0.00543, a
value that hardly seems important.
The model’s explanatory power based on the estimated structural
parameters ( all the parameter estimates ) is 0.8105 ( 0.9968. ) Those
who believe that the latter level of explanatory power is somehow “too
outstanding to be true,” are cheerfully invited to reinspect Figure 9
and contemplate the concept of autocorrelation and how modeling this
autocorrelation can affect a model’s level of explanatory power. In any
event, the view here follows Hyndman and Athanasopoulos (2018, 3.4), who
note that true adequacy… “ can only be determined by considering
how well a model performs on new data that were not used when fitting
the model.” It is also noted
that even though a model’s
R2 equivalence is a well-recognized measure of model
adequacy, a good case can be made that achieving white noise in the
residuals is also important ( Becketti, 2013, p. 256; Kennedy,
2008, p. 315; and Granger and Newbold, 1974, p. 119). To assess whether
this measure of adequacy is achieved, Portmanteau tests for
autocorrelation were conducted for the hourly lags 1 through 100, 192,
284, and 672. At lag 1, the P- value is 0.1958. For the remaining
111 lags that were assessed, the P -values are less than .05,
thereby rejecting the null hypothesis of a white noise error structure.