2.4.1 Ensemble Kalman filter
In EnKF, the relation of the observation \(\mathbf{Y}_{t}\) to the model simulated state \({\hat{\mathbf{Y}}}_{t}\) can be described as:
\(\mathbf{Y}_{t}=\mathbf{H}{\hat{\mathbf{Y}}}_{t}+\mathbf{\varepsilon}_{t}\)(12)
where \(\mathbf{H}\) is the measurement operator that maps the model state to the observation. The ensembles of observations and simulations of model state at \(t\) are stored in \(\mathbf{Y}_{t}\) and\({\hat{\mathbf{Y}}}_{t}\), respectively. Both have a dimension of\(N_{y}\times N_{\text{ens}}\), in which \(N_{y}\) and\(N_{\text{ens}}\) represent the dimension of observed states and the ensemble size, respectively.
The observations \(\mathbf{Y}_{t,j}\) in ensemble \(\mathbf{Y}_{t}\)were drawn from a \(N_{y}\)-variate Gaussian distribution with mean equal to the observation, \(\mathbf{Y}_{t}^{y}\), and covariance equal to \(\mathbf{R}_{t}\),
\(\mathbf{Y}_{t,j}=\mathbf{Y}_{t}^{y}+\mathbf{\sigma}_{t}^{y}\) (13)
in which\(\mathbf{\sigma}_{t}^{y}\mathcal{\sim N}\left(0,\mathbf{R}_{t}\right)\).
The model error covariance \(\mathbf{P}_{t}^{f}\) of\({\hat{\mathbf{Y}}}_{t}\) is calculated, using:
\(\mathbf{P}_{t}^{f}=\left(N_{\text{ens}}-1\right)^{-1}\sum_{j=1}^{N_{\text{ens}}}{\left({\hat{\mathbf{Y}}}_{t,j}-{\overset{\overline{}}{\mathbf{Y}}}_{t}^{f}\right)\left({\hat{\mathbf{Y}}}_{t,j}-{\overset{\overline{}}{\mathbf{Y}}}_{t}^{f}\right)^{T}}\)(14)
in which \({\hat{\mathbf{Y}}}_{t,j}\) and\({\overset{\overline{}}{\mathbf{Y}}}_{t}^{f}\) represent the single simulation trajectory and the mean of the ensemble\({\hat{\mathbf{Y}}}_{t}\), respectively and the superscript T denotes the transpose of the matrix.\({\overset{\overline{}}{\mathbf{Y}}}_{t}^{f}\) is calculated as\(\frac{\left(\sum_{j=1}^{N_{\text{ens}}}{\hat{\mathbf{Y}}}_{t,j}\right)}{N_{\text{ens}}}\).
Under the linear assumptions, the updated analysis state\(\mathbf{Y}_{t}^{a}\) and its error covariance \(\mathbf{P}_{t}^{a}\)are calculated following:
\(\mathbf{Y}_{t}^{a}=\ {\hat{\mathbf{Y}}}_{t}+\mathbf{K}\left(\mathbf{Y}_{t}-\mathbf{H}{\hat{\mathbf{Y}}}_{t}\right)\)(15)
\(\mathbf{P}_{t}^{a}=\left(\mathbf{I}-\mathbf{\text{KH}}\right)\mathbf{P}_{t}^{f}\)(16)
in which \(\mathbf{I}\) is the identity matrix, and \(\mathbf{K}\) is the Kalman gain, defined as:
\(\mathbf{K}=\mathbf{P}_{t}^{f}\mathbf{H}^{T}\left(\mathbf{H}\mathbf{P}_{t}^{f}\mathbf{H}^{T}+\mathbf{R}_{t,cov}\right)^{-1}\)(17)
where\(\mathbf{P}_{t}^{f}\mathbf{H}^{T}=\left(N_{\text{ens}}-1\right)^{-1}\sum_{j=1}^{N_{\text{ens}}}{\left({\hat{\mathbf{Y}}}_{t,j}-{\overset{\overline{}}{\mathbf{Y}}}_{t}^{f}\right)\left(\mathbf{H}{\hat{\mathbf{Y}}}_{t,j}-\mathbf{H}{\overset{\overline{}}{\mathbf{Y}}}_{t}^{f}\right)^{T}}\).
In summary, in the light of an ensemble of model trajectories, the EnKF approximates the probability density of the model states at each time step \(t\). The updated mean of this ensemble represents the “best” state estimate, whereas the spread of the updated ensemble members provides a measure of the output uncertainty (Evensen, 1994).