Description: Mathematical methods for combining a dynamical model's forecast with observations to produce a best-estimate trajectory — sequential (Kalman filter family, ensemble Kalman filter, particle filter) vs variational (3D-Var, 4D-Var). Widely used in weather forecasting, oceanography, reservoir engineering, and SLAM.
My Notion Note ID: K2C-1-3
Created: 2020-06-01
Updated: 2026-05-22
License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

1. The Problem

You have a dynamical model (set of ODEs/PDEs) for a physical system — e.g., the atmosphere, the ocean, an underground reservoir, a vehicle's pose.
You also have noisy observations of some of the state variables, scattered in time and space.
Neither alone tells the whole story:
- The model drifts: small initial errors compound, parameterizations are imperfect.
- The observations are sparse, noisy, and only see some variables — never the full state.
Data assimilation = combine the two. Produce a trajectory that is consistent with both the model's dynamics and the observed data, weighted by their respective uncertainties.

Two large algorithm families:

Family	Idea	Examples
Sequential	Update the state estimate one observation (or batch) at a time as observations come in. Maintain a posterior estimate + uncertainty.	Kalman filter, Extended KF, Ensemble KF, Particle filter.
Variational	Minimize a cost functional over a time window combining model fit + observation fit. One optimization per assimilation window.	3D-Var, 4D-Var.

Hybrids combine elements of both (EnVar, 4DEnVar, hybrid 4D-Var).

2. The Bayesian Frame

Both families derive from Bayesian estimation:

p(x_t | y_{1:t}) ∝ p(y_t | x_t) · p(x_t | y_{1:t-1})

Where:

x_t — state at time t.
y_{1:t} — observations up to time t.
p(x_t | y_{1:t-1}) — the forecast (prior), produced by running the model from the previous estimate.
p(y_t | x_t) — the observation likelihood.
p(x_t | y_{1:t}) — the analysis (posterior).

Sequential methods compute the posterior recursively. Variational methods minimize the negative log-posterior over a window, with the model as a strong (4D-Var) or weak constraint.

3. Sequential Methods

3.1 Kalman Filter (KF)

Optimal for linear dynamics + linear observation operator + Gaussian noise.
Maintains x̂ (mean) and P (covariance) of the state.
Two-step cycle:
- Forecast: x̂⁻ = F x̂ ; P⁻ = F P Fᵀ + Q (process noise covariance Q).
- Analysis: Kalman gain K = P⁻ Hᵀ (H P⁻ Hᵀ + R)⁻¹ ; x̂ = x̂⁻ + K (y - H x̂⁻) ; P = (I - K H) P⁻ (observation noise covariance R, operator H).
Cost: O(n³) per step for the inverse — fine for n in low thousands, infeasible for n = 10⁹ (weather model state size).

3.2 Extended Kalman Filter (EKF)

For weakly nonlinear systems. Linearize F and H around the current estimate (Jacobians).
Used in early SLAM and navigation systems (GPS-INS fusion).
Breaks down with strong nonlinearity or bimodal posteriors.

3.3 Ensemble Kalman Filter (EnKF)

Represent the state distribution by an ensemble of N members (typically 50-200 in weather, 20-50 in ocean).
Forecast = propagate each ensemble member through the (possibly nonlinear) model.
Estimate covariance from the ensemble: P ≈ (1/(N-1)) Σ (x_i - x̄)(x_i - x̄)ᵀ.
No tangent-linear or adjoint code needed — huge implementation win.
Cost per step: O(N · model cost) for propagation + O(N · m²) for update with m observations.
Standard in operational weather forecasting (ECMWF, NCEP, JMA all run ensemble systems).

3.4 Particle Filter (Sequential Monte Carlo)

Represent posterior as weighted set of particles. No Gaussian assumption.
Forecast: propagate each particle.
Analysis: reweight by observation likelihood; resample to avoid degeneracy.
Excellent for strongly nonlinear / non-Gaussian problems.
Suffers from "curse of dimensionality" — number of particles needed grows exponentially with state dimension. Rarely used at full state-vector scale; common for low-dim subproblems (e.g., target tracking).

4. Variational Methods

4.1 3D-Var

Single-time optimization. Find state x minimizing:

J(x) = (x - x_b)ᵀ B⁻¹ (x - x_b) + (y - H(x))ᵀ R⁻¹ (y - H(x))

with x_b = background (prior forecast), B = background error covariance, R = observation error covariance.
The two terms penalize disagreement with the background and the observations respectively.
Solved with conjugate gradient or quasi-Newton (L-BFGS); requires Hᵀ (adjoint of the observation operator) but not the model's tangent-linear/adjoint.

4.2 4D-Var

Like 3D-Var but the model is part of the constraint. Minimize over the initial state x_0 of a window:

J(x_0) = (x_0 - x_b)ᵀ B⁻¹ (x_0 - x_b) + Σ_t (y_t - H_t M_{0,t}(x_0))ᵀ R_t⁻¹ (y_t - H_t M_{0,t}(x_0))

where M_{0,t} propagates the model from t=0 to t.
Requires the adjoint model Mᵀ — substantial software engineering effort to maintain alongside the forward model.
Operational backbone at ECMWF, Met Office, Météo-France for global NWP since the early 2000s.
"Strong constraint" 4D-Var assumes the model is perfect inside the window; "weak constraint" 4D-Var adds a model-error term.

4.3 Sequential vs Variational

Sequential advantages: incremental, no need to wait for a whole window, no adjoint required (EnKF). Disadvantage: only the present is updated.
Variational advantages: uses all observations in the window self-consistently, propagates information backward in time through the adjoint. Disadvantage: adjoint code maintenance, batch latency.

5. Hybrid Methods

Modern operational systems combine both:

EnVar / Hybrid-Var — use the ensemble to estimate the flow-dependent background error covariance B, then plug into a variational solver.
4DEnVar — 4D variational analysis with covariance localized via an ensemble; avoids the adjoint by using ensemble perturbations to estimate the tangent linear.
ECMWF runs EDA + 4D-Var (Ensemble of Data Assimilations + deterministic 4D-Var). NCEP runs hybrid 4DEnVar.

6. Connection to Filtering and SLAM

The same math powers other fields under different names:

Robotics / SLAM — pose-graph optimization is essentially weak-constraint 4D-Var. EKF-SLAM, Particle-filter-SLAM are sequential methods. Modern systems (ORB-SLAM, VINS-Fusion) use sliding-window bundle adjustment ≈ short-window 4D-Var.
Reservoir engineering — history matching = 4D-Var; ensemble methods (EnKF, ES-MDA) widely used.
Target tracking — Kalman variants; multi-target uses Joint Probabilistic Data Association or Multiple Hypothesis Tracking (combinatorial structure on top of Bayesian filtering).
State-space modeling in econometrics / signal processing — same filtering theory; smaller scale.

7. Pitfalls

B matrix dimensionality — for a global weather model with state dim ~10⁹, storing B explicitly is impossible (10¹⁸ entries). Operational systems use spectral / wavelet / ensemble-based representations.
Localization — long-range spurious correlations in small ensembles. Apply Schur-product localization (multiply ensemble covariance by a compactly supported function).
Inflation — small ensembles underestimate variance; multiplicatively inflate the ensemble spread (typical factor 1.05-1.15).
Observation bias — assumed-unbiased observations are usually biased. Bias correction (VarBC) adds bias parameters to the variational cost.
Filter divergence — analysis becomes overconfident; observations get ignored; trajectory drifts. Often caused by underestimated P (inadequate inflation) or missing model error.
Strongly non-Gaussian states (e.g., precipitation, sea ice concentration on [0,1]) — Gaussian-based methods misbehave. Use transformations (anamorphosis), Gamma observation operators, or particle methods.
Adjoint maintenance — adjoint code drifts from the forward model unless tested rigorously (twin experiments, gradient checks).
The two-camp confusion — in the source the original note distinguished sequential (序贯) — propagates state forward, updates as observations arrive — from variational (变分) — optimizes the whole window using all observations together. The distinction is real and explained more carefully above (§ 3 vs § 4).

8. References

Asch, Bocquet, Nodet, Data Assimilation: Methods, Algorithms, and Applications, SIAM, 2016. (Modern textbook treatment.)
Kalnay, Atmospheric Modeling, Data Assimilation and Predictability, Cambridge, 2003. (Classic.)
Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer, 2009.
Particle filter for non-Gaussian DA — Van Leeuwen, "Particle filtering in geophysical systems," 2009.
ECMWF operational DA documentation — https://www.ecmwf.int/en/research/data-assimilation
Wikipedia: Data assimilation — https://en.wikipedia.org/wiki/Data_assimilation
Wikipedia: Kalman filter — https://en.wikipedia.org/wiki/Kalman_filter
Wikipedia: Ensemble Kalman filter — https://en.wikipedia.org/wiki/Ensemble_Kalman_filter
For SLAM / bundle adjustment connection — Triggs et al., "Bundle Adjustment — A Modern Synthesis," 2000.

Table of Contents