Data Assimilation


  • Description: Mathematical methods for combining a dynamical model's forecast with observations to produce a best-estimate trajectory — sequential (Kalman filter family, ensemble Kalman filter, particle filter) vs variational (3D-Var, 4D-Var). Widely used in weather forecasting, oceanography, reservoir engineering, and SLAM.
  • My Notion Note ID: K2C-1-3
  • Created: 2020-06-01
  • Updated: 2026-05-22
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. The Problem

  • You have a dynamical model (set of ODEs/PDEs) for a physical system — e.g., the atmosphere, the ocean, an underground reservoir, a vehicle's pose.
  • You also have noisy observations of some of the state variables, scattered in time and space.
  • Neither alone tells the whole story:
    • The model drifts: small initial errors compound, parameterizations are imperfect.
    • The observations are sparse, noisy, and only see some variables — never the full state.
  • Data assimilation = combine the two. Produce a trajectory that is consistent with both the model's dynamics and the observed data, weighted by their respective uncertainties.

Two large algorithm families:

Family Idea Examples
Sequential Update the state estimate one observation (or batch) at a time as observations come in. Maintain a posterior estimate + uncertainty. Kalman filter, Extended KF, Ensemble KF, Particle filter.
Variational Minimize a cost functional over a time window combining model fit + observation fit. One optimization per assimilation window. 3D-Var, 4D-Var.

Hybrids combine elements of both (EnVar, 4DEnVar, hybrid 4D-Var).

2. The Bayesian Frame

Both families derive from Bayesian estimation:

p(x_t | y_{1:t}) ∝ p(y_t | x_t) · p(x_t | y_{1:t-1})

Where:

  • x_t — state at time t.
  • y_{1:t} — observations up to time t.
  • p(x_t | y_{1:t-1}) — the forecast (prior), produced by running the model from the previous estimate.
  • p(y_t | x_t) — the observation likelihood.
  • p(x_t | y_{1:t}) — the analysis (posterior).

Sequential methods compute the posterior recursively. Variational methods minimize the negative log-posterior over a window, with the model as a strong (4D-Var) or weak constraint.

3. Sequential Methods

3.1 Kalman Filter (KF)

  • Optimal for linear dynamics + linear observation operator + Gaussian noise.
  • Maintains (mean) and P (covariance) of the state.
  • Two-step cycle:
    • Forecast: x̂⁻ = F x̂ ; P⁻ = F P Fᵀ + Q (process noise covariance Q).
    • Analysis: Kalman gain K = P⁻ Hᵀ (H P⁻ Hᵀ + R)⁻¹ ; x̂ = x̂⁻ + K (y - H x̂⁻) ; P = (I - K H) P⁻ (observation noise covariance R, operator H).
  • Cost: O(n³) per step for the inverse — fine for n in low thousands, infeasible for n = 10⁹ (weather model state size).

3.2 Extended Kalman Filter (EKF)

  • For weakly nonlinear systems. Linearize F and H around the current estimate (Jacobians).
  • Used in early SLAM and navigation systems (GPS-INS fusion).
  • Breaks down with strong nonlinearity or bimodal posteriors.

3.3 Ensemble Kalman Filter (EnKF)

  • Represent the state distribution by an ensemble of N members (typically 50-200 in weather, 20-50 in ocean).
  • Forecast = propagate each ensemble member through the (possibly nonlinear) model.
  • Estimate covariance from the ensemble: P ≈ (1/(N-1)) Σ (x_i - x̄)(x_i - x̄)ᵀ.
  • No tangent-linear or adjoint code needed — huge implementation win.
  • Cost per step: O(N · model cost) for propagation + O(N · m²) for update with m observations.
  • Standard in operational weather forecasting (ECMWF, NCEP, JMA all run ensemble systems).

3.4 Particle Filter (Sequential Monte Carlo)

  • Represent posterior as weighted set of particles. No Gaussian assumption.
  • Forecast: propagate each particle.
  • Analysis: reweight by observation likelihood; resample to avoid degeneracy.
  • Excellent for strongly nonlinear / non-Gaussian problems.
  • Suffers from "curse of dimensionality" — number of particles needed grows exponentially with state dimension. Rarely used at full state-vector scale; common for low-dim subproblems (e.g., target tracking).

4. Variational Methods

4.1 3D-Var

  • Single-time optimization. Find state x minimizing:

    J(x) = (x - x_b)ᵀ B⁻¹ (x - x_b) + (y - H(x))ᵀ R⁻¹ (y - H(x))

    with x_b = background (prior forecast), B = background error covariance, R = observation error covariance.

  • The two terms penalize disagreement with the background and the observations respectively.

  • Solved with conjugate gradient or quasi-Newton (L-BFGS); requires Hᵀ (adjoint of the observation operator) but not the model's tangent-linear/adjoint.

4.2 4D-Var

  • Like 3D-Var but the model is part of the constraint. Minimize over the initial state x_0 of a window:

    J(x_0) = (x_0 - x_b)ᵀ B⁻¹ (x_0 - x_b) + Σ_t (y_t - H_t M_{0,t}(x_0))ᵀ R_t⁻¹ (y_t - H_t M_{0,t}(x_0))

    where M_{0,t} propagates the model from t=0 to t.

  • Requires the adjoint model Mᵀ — substantial software engineering effort to maintain alongside the forward model.

  • Operational backbone at ECMWF, Met Office, Météo-France for global NWP since the early 2000s.

  • "Strong constraint" 4D-Var assumes the model is perfect inside the window; "weak constraint" 4D-Var adds a model-error term.

4.3 Sequential vs Variational

  • Sequential advantages: incremental, no need to wait for a whole window, no adjoint required (EnKF). Disadvantage: only the present is updated.
  • Variational advantages: uses all observations in the window self-consistently, propagates information backward in time through the adjoint. Disadvantage: adjoint code maintenance, batch latency.

5. Hybrid Methods

Modern operational systems combine both:

  • EnVar / Hybrid-Var — use the ensemble to estimate the flow-dependent background error covariance B, then plug into a variational solver.
  • 4DEnVar — 4D variational analysis with covariance localized via an ensemble; avoids the adjoint by using ensemble perturbations to estimate the tangent linear.
  • ECMWF runs EDA + 4D-Var (Ensemble of Data Assimilations + deterministic 4D-Var). NCEP runs hybrid 4DEnVar.

6. Connection to Filtering and SLAM

The same math powers other fields under different names:

  • Robotics / SLAM — pose-graph optimization is essentially weak-constraint 4D-Var. EKF-SLAM, Particle-filter-SLAM are sequential methods. Modern systems (ORB-SLAM, VINS-Fusion) use sliding-window bundle adjustment ≈ short-window 4D-Var.
  • Reservoir engineering — history matching = 4D-Var; ensemble methods (EnKF, ES-MDA) widely used.
  • Target tracking — Kalman variants; multi-target uses Joint Probabilistic Data Association or Multiple Hypothesis Tracking (combinatorial structure on top of Bayesian filtering).
  • State-space modeling in econometrics / signal processing — same filtering theory; smaller scale.

7. Pitfalls

  • B matrix dimensionality — for a global weather model with state dim ~10⁹, storing B explicitly is impossible (10¹⁸ entries). Operational systems use spectral / wavelet / ensemble-based representations.
  • Localization — long-range spurious correlations in small ensembles. Apply Schur-product localization (multiply ensemble covariance by a compactly supported function).
  • Inflation — small ensembles underestimate variance; multiplicatively inflate the ensemble spread (typical factor 1.05-1.15).
  • Observation bias — assumed-unbiased observations are usually biased. Bias correction (VarBC) adds bias parameters to the variational cost.
  • Filter divergence — analysis becomes overconfident; observations get ignored; trajectory drifts. Often caused by underestimated P (inadequate inflation) or missing model error.
  • Strongly non-Gaussian states (e.g., precipitation, sea ice concentration on [0,1]) — Gaussian-based methods misbehave. Use transformations (anamorphosis), Gamma observation operators, or particle methods.
  • Adjoint maintenance — adjoint code drifts from the forward model unless tested rigorously (twin experiments, gradient checks).
  • The two-camp confusion — in the source the original note distinguished sequential (序贯) — propagates state forward, updates as observations arrive — from variational (变分) — optimizes the whole window using all observations together. The distinction is real and explained more carefully above (§ 3 vs § 4).

8. References