State space model

Kalman is a “optimal solution” of the state space model,

$\mathbf{x}_t = \mathbf{F}_t \mathbf{x}_{t-1} + \mathbf{B}_{t}\mathbf{u}_{t} + \mathbf{w}_{t} \\ \mathbf{z}_{t} = \mathbf{H}_{t}\mathbf{x}_{t} + \mathbf{v}_{t}$

where $\boldsymbol{P}_t$ it the covariance matrix of the Gaussian function of the state estimation, $\mathbf{w}_t \sim N(0, \mathbf{Q}_t)$ is an addictive noise to predicted state’s PDF, $\mathbf{v}_t \sim N(0, \mathbf{R}_t)$ is an addictive noise to measurement’s PDF.

Kalman has two strong assumptions,

$\mathbf{x}_t$, $\mathbf{z}_t$ $\mathbf{w}_t$, $\mathbf{v}_t$ are gaussian distributed;
Two equations are linear.

With these two assumptions, everything we deal with recursively (time series) is gaussian. Their PDFs that only require mean and variance are easy to computed. The optimal estimation is just the mean of the PDF.

Kalman filtering

Reference: Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation source

Prediction

Assuming that we have updated the hidden state at time $t-1$ and get $\hat{\mathbf{x}}_{t-1|t-1}$ (the estimation of the state value) and $\mathbf{P}_{t-1|t-1}$ (the covariance of the states). We can see that the state $t-1$ follows the multivariate gaussian distribution $\text{N}(\hat{\mathbf{x}}_{t-1|t-1}, \mathbf{P}_{t-1|t-1})$.

The first step of Kalman filtering is to predict the next state at time $t$ without observation. To be noticed, we do not estimate the optimal value directly, but the PDF instead. From the state equation, we know that the predicted state follows the gaussian distribution $\text{N}(\hat{\mathbf{x}}_{t|t-1}, \mathbf{P}_{t|t-1})$, where

$\hat{\mathbf{x}}_{t|t-1} = \mathbf{F}_t \hat{\mathbf{x}}_{t-1|t-1} + \mathbf{B}_{t}\mathbf{u}_{t} \\ \mathbf{P}_{t|t-1} = \mathbf{F}_t \mathbf{P}_{t-1|t-1} \mathbf{F}_t^T + \mathbf{Q}_{t}$

The derivation of $\mathbf{P}_{t|t-1}$ can be found in [simple]. $\mathbf{P}_{t|t-1}$ is the covariance of the predicted PDF that will be used to compare with $\mathbf{R}_t$, the uncertainty in measurement, to determine the confidence in the prediction.

Measurement update

Our main focus is on the measurement domain, and thus we do the projections in $\text{PDF}_1$ and $\text{PDF}_3$.

With the predicted PDF of ${\mathbf{x}}_{t|t-1}$, we can get the predicted PDF of the measurement $\text{PDF}_1 = \text{N}(\mathbf{H}_t \hat{\mathbf{x}}_{t|t-1}, \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T)$ . This is the first gaussian distribution in this part, and it is projected from the state domain.

The measurement vector $\mathbf{z}_{t}$ also follows a gaussian distribution $\text{PDF}_2 = \text{N}(\mathbf{z}_t, \mathbf{R}_{t})$

The updated state also has a projected gaussian distribution $\text{PDF}_3 = \text{N}(\mathbf{H}_t \hat{\mathbf{x}}_{t|t}, \mathbf{H}_t \mathbf{P}_{t|t} \mathbf{H}_t^T)$

The best $\text{PDF}_3$ equals to $\text{PDF}_1 \times \text{PDF}_2$. It is a combination of prediction and measurement.

We know that the product of two multivariate gaussian distribution has the properties,

$\Sigma_3 = (\Sigma_1^{-1} + \Sigma_2^{-1})^{-1}$
$\mu_3 = \Sigma_3 \Sigma_1^{-1}\mu_1 + \Sigma_3 \Sigma_2^{-1}\mu_2$

Product of two multivariate gaussian distribution

According to the Kailath Variant in matrix cookbook, we can derive the covariance as an example,

$\begin{align*} \mathbf{H}_t \mathbf{P}_{t|t} \mathbf{H}_t^T &= ((\mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T)^{-1} + \mathbf{R}_{t}^{-1} )^{-1} \\ &= \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T - \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T \mathbf{R}_{t}^{-1} (\mathbf{I}+ \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T \mathbf{R}_{t}^{-1})^{-1} \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T \\ &= \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T - \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T (\mathbf{R}_{t}+ \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T)^{-1} \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T \\ &= \mathbf{H}_t(\mathbf{P}_{t|t-1} - \mathbf{P}_{t|t-1} \mathbf{H}_t^T (\mathbf{R}_{t}+ \mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T)^{-1} \mathbf{H}_t \mathbf{P}_{t|t-1} ) \mathbf{H}_t^T \\ &= \mathbf{H}_t(\mathbf{P}_{t|t-1} - \mathbf{K}_t \mathbf{H}_t \mathbf{P}_{t|t-1} ) \mathbf{H}_t^T \end{align*}$ $\mathbf{P}_{t|t} =\mathbf{P}_{t|t-1} - \mathbf{K}_t \mathbf{H}_t \mathbf{P}_{t|t-1}$

where Kalman gain is defined as,

$\mathbf{K}_t = \mathbf{P}_{t|t-1} \mathbf{H}_t^T (\mathbf{H}_t \mathbf{P}_{t|t-1} \mathbf{H}_t^T + \mathbf{R}_{t})^{-1}$

Kalman Filtering

PRML

The hidden Markov model, in which the latent variables are discrete, and linear dynamical systems, in which the latent variables are Gaussian.

Simple and Intuitive Derivation

卡尔曼滤波和HMM的区别，这在EM的时候可以注意下：

The Kalman filter is an algorithm permitting exact inference in a linear dynamical system, which is a Bayesian model similar to a hidden Markov model but where the state space of the latent variables is continuous and where all latent and observed variables have a Gaussian distribution (often a multivariate Gaussian distribution).

The Kalman filter provides an algorithm to determine an estimate $\mathbf{x}_t$ by combining models of the system and noisy measurements of certain parameters or linear functions of parameters.

The estimates of the parameters of interest in the state vector are therefore now provided by probability density functions (pdfs), rather than discrete values.

$\boldsymbol{P}_t$ it the covariance matrix of the Gaussian function of the state estimation. The terms along the main diagonal of $\boldsymbol{P}_t$ are the variances associated with the corresponding terms in the state vector. The off-diagonal terms of $\boldsymbol{P}_t$ provide the covariances between terms in the state vector.

$\mathbf{w}_t$ is the vector containing the process noise terms for each parameter in the state vector. The process noise is assumed to be drawn from a zero mean multivariate normal distribution with covariance given by the covariance matrix $\mathbf{Q}_t$.

Prediction

State space form provides a perfect form of the real hidden states and observations, which means that the real state $x_t = F x_{t-1} + B_t u_t + w_t$, while the predicted state $\hat x_{t|t-1} = F \hat x_{t-1|t-1} + B_t u_t $ without the noise term.

The variance associated with the prediction $\hat{x}_{t|t-1}$ of an unknown true value $x_t$ is given by $P_{t|t-1} = \text{E}[(x_t-\hat{x}_{t|t-1})(x_t-\hat{x}_{t|t-1})^T]$

The derivation of extrapolated process noise can be found in [simple].

Measurement Update

A key property of the Gaussian function is exploited at this point: the product of two Gaussian functions is another Gaussian function. This is critical as it permits an endless number of Gaussian pdfs to be multiplied over time, but the resulting function does not increase in complexity or number of terms; after each time epoch the new pdf is fully represented by a Gaussian function. This is the key to the elegant recursive properties of the Kalman filter.

Statistics and Econometric Models

The Kalman filter and the Kalman smoothing can be used to compute conditional expectations in a state space framework.

The state equation defines completely the distribution of $\beta_t$ for every $t\ge 1$.

There are three kinds of numerical determination of conditional expectations problems.

Filtering: $E(\beta_t|z_1,\dots,z_t)$ is the optimal approximation to $\beta_t$ given information available at time $t$;
Smoothing: $E(\beta_s|z_1,\dots,z_t), s<t$ is the optimal approximation to $\beta_s$ given information available at time $t$;
Smoothing: $E(\beta_s|z_1,\dots,z_t)$ or $E(z_s|z_1,\dots,z_t), s<t$.