Geometric Integration & Hamiltonian Monte Carlo
Euler and Runge-Kutta treat the state space as flat and chase down local truncation error and they wipe out the global invariants along the way. Run them on a Hamiltonian system and energy drifts until the simulation blows up.
In HMC that energy drift breaks Detailed Balance and wrecks the sampler.
We want integrators that hold onto structure like symplecticity and volume and momentum by design and not by accident, and that is what geometric integration is. These notes build up the geometry of cotangent bundles and shadow Hamiltonians and then poke at the practical edge cases you hit when you actually implement the thing.
1. The Symplectic Setup
A Hamiltonian system lives on a symplectic manifold and usually that manifold is the cotangent bundle with coordinates and the standard 2-form .
Given the vector field satisfies and in coordinates this gives Hamilton’s equations.
Or you write it as with .
The symplectic structure hands a Lie algebra through the Poisson bracket.
A flow is symplectic when and this is the same as saying . Every Hamiltonian flow is symplectic and by Liouville’s theorem it holds the volume form fixed too.
2. Splitting and Why Leapfrog Works
For we cannot solve the flow exactly. But we can split it. If and are each integrable then their exact flows and are symplectic and stacking symplectic maps together gives you another symplectic map.
Leapfrog is the symmetric splitting.
Each sub-map is a shear with a triangular matrix and unit diagonal so the Jacobian determinant is just 1. Symplecticity holds for any step size inside the stability region. The geometry is exact and only the trajectory gets approximated.
3. Lie Group Integrators (Munthe-Kaas)
The standard update assumes the state space is closed under addition and that falls apart on a Lie group like because adding a tangent vector to a rotation matrix spits out something that is no longer a rotation matrix.
The fix is to use the exponential map from the Lie algebra.
where . Munthe-Kaas or RKMK methods push Runge-Kutta onto manifolds by solving the ODE inside the Lie algebra and mapping back. Joint angles stay unit length and rigid bodies keep their geometric constraints.
4. Variational Integrators
Lagrangian mechanics minimizes action.
Discretize the action directly and not the ODE.
Apply Hamilton’s Principle and you get the Discrete Euler-Lagrange equations.
The key point is that every variational integrator is symplectic because it chops up the variational principle instead of the differential equation. The discrete symplectic form falls out for free. Variational integration is the standard in orbital mechanics and celestial simulation and I think it is underappreciated in the ML community.
5. Shadow Hamiltonians
What does Leapfrog actually solve? Not exactly and instead there is a modified Hamiltonian whose exact flow runs through the numerical points .
Write the Leapfrog operator using Lie operators as .
By Baker-Campbell-Hausdorff this equals and for symmetric splitting the error terms show up only in even powers of .
For separable this reads
is formally an infinite series and it diverges. But for small truncation works fine. For a harmonic oscillator with frequency stability asks for
Past that threshold the shadow Hamiltonian stops existing and the dynamics turn chaotic.
6. The Ge-Marsden No-Go Theorem
Can you build a symplectic integrator that also holds energy exactly?
Theorem (Ge-Marsden 1988): If a fixed-step integrator is symplectic and energy-preserving with and the system is non-integrable then the integrator is just the identity map.
You have to pick between exact geometry and exact energy. Symplectic integrators pick geometry and this is the better trade because the trajectory stays near energy contours anyway and preserving phase-space volume is more useful statistically than hitting exact energy conservation.
7. HMC and the Kepler Problem
HMC in practice runs three steps.
- Sample .
- Run Leapfrog for steps.
- Metropolis accept/reject: .
If Leapfrog held perfectly then would always be 1. It holds instead and so the acceptance error scales as .
A good test case is the Kepler problem with . Leapfrog gives a stable orbit that precesses a little. RK4 bleeds energy until the planet spirals into the sun or pumps energy in and flings it out.
import numpy as np
import matplotlib.pyplot as plt
def kepler_potential(q):
return -1.0 / np.linalg.norm(q)
def kepler_grad(q):
return q / np.linalg.norm(q)**3
def leapfrog_kepler(steps=2000, h=0.01):
q = np.array([1.0, 0.0])
p = np.array([0.0, 0.5]) # Low velocity for elliptical orbit
path = []
energy = []
for _ in range(steps):
p -= (h/2) * kepler_grad(q)
q += h * p
p -= (h/2) * kepler_grad(q)
path.append(q.copy())
energy.append(0.5*np.dot(p, p) + kepler_potential(q))
return np.array(path), np.array(energy)
def rk4_kepler(steps=2000, h=0.01):
z = np.array([1.0, 0.0, 0.0, 0.5]) # q1, q2, p1, p2
path = []
energy = []
def f(state):
q = state[:2]
p = state[2:]
return np.concatenate([p, -kepler_grad(q)])
for _ in range(steps):
k1 = f(z)
k2 = f(z + h/2 * k1)
k3 = f(z + h/2 * k2)
k4 = f(z + h * k3)
z += (h/6) * (k1 + 2*k2 + 2*k3 + k4)
path.append(z[:2].copy())
p = z[2:]
energy.append(0.5*np.dot(p, p) + kepler_potential(z[:2]))
return np.array(path), np.array(energy)
def plot_kepler_results():
path_lf, e_lf = leapfrog_kepler()
path_rk, e_rk = rk4_kepler()
plt.figure(figsize=(12, 5))
# Orbits
plt.subplot(1, 2, 1)
plt.plot(path_lf[:,0], path_lf[:,1], label='Leapfrog (Symplectic)')
plt.plot(path_rk[:,0], path_rk[:,1], 'r--', label='RK4 (Energy Drift)')
plt.scatter([0], [0], color='orange', s=100, label='Sun')
plt.title('Kepler Orbits (2000 steps)')
plt.legend()
# Energy error
plt.subplot(1, 2, 2)
plt.plot(e_lf - e_lf[0], label='Leapfrog Error')
plt.plot(e_rk - e_rk[0], label='RK4 Error')
plt.title('Energy Fluctuation vs Drift')
plt.ylabel('H - H0')
plt.legend()
# plt.show()8. RATTLE (Constrained Systems)
Systems with holonomic constraints like fixed bond lengths in a molecule break plain Leapfrog within a few steps. The fix is to throw in Lagrange multipliers.
RATTLE solves for to make both and hold. Its particular staggering of multiplier updates keeps the symplectic form on the constrained manifold. The proof is not obvious but the result holds.
9. Adaptive Step Sizes and Symplecticity
Naively varying based on the state wipes out symplecticity. If depends on then the map is no longer symplectic and energy drift comes right back.
The fix is to reparametrize time by setting and solving a modified Hamiltonian . Fixed steps in give you variable resolution in and the geometric invariants stay intact.
10. Neural Hamiltonian Flows
If you parametrize a Hamiltonian as a neural network and run the flow with Leapfrog layers then you get a normalizing flow that is exactly reversible and volume-preserving and physics-aware all at once. Exact reversibility means you can reconstruct the latent noise from the output with no approximation. Volume-preserving means the Jacobian determinant is just 1 and the change-of-variables formula simplifies. And physics-aware means the network learns the conserved quantities of whatever dynamical system it is modeling.
This is the idea behind Hamiltonian Neural Networks and Symplectic Adjoint Networks. The symplecticity is what makes the normalizing flow tractable because without it computing the Jacobian determinant would be too expensive to bother with.
11. The Edge of Stability
The shadow Hamiltonian is an asymptotic series. For small it nails the numerical flow almost perfectly. But in practice you want the largest stable you can get and as grows the error terms grow too. Past a threshold the series diverges and orbits leave the invariant KAM tori and fill phase space and HMC acceptance rates collapse.
The practical problem is always the same and it is to push as close to the convergence boundary as you can without crossing it.
Liouville’s Theorem (Proof)
Theorem: If and then volume is conserved.
Let be the volume of a set carried along by the flow.
Set . Jacobi’s formula gives .
With the chain rule gives and so and .
For a Hamiltonian the divergence is .
So and since we get for all time.
Symplectic Euler
Order 1 symplectic method and simple but worth knowing.
- (use the new momentum)
The Jacobian is
The determinant is and that confirms the map is symplectic.
It is not time-reversible though. Symmetrizing it gives you Leapfrog with .
KAM and Long-Term Stability
How long does shadow Hamiltonian stability actually last?
KAM theory says that for small perturbations of integrable systems most invariant tori survive. A symplectic integrator is exactly that kind of perturbation. Backward error analysis shows the discrete trajectory stays close to for times .
That is exponentially long. In practice symplectic integrators track a nearby stable system for astronomical timescales literally. This is why people use them to study Solar System stability over billions of years where RK4 would have diverged within a few millennia.
Exterior Calculus Perspective
Symplecticity is area preservation said carefully in the language of exterior calculus.
measures oriented area in phase space and is symplectic iff . By Stokes
For HMC this means the integrator holds phase-space structure with no compression and no expansion. And without this the Markov chain would pick up a bias toward high-density regions.
Generating Functions
Every symplectic map can locally be written through a generating function with and .
When satisfies Hamilton-Jacobi
it pins down the exact flow. Symplectic integrators match up with approximate generating functions . The approximation still satisfies the symplectic group’s consistency conditions and that is why the method works even though is inexact.
Stochastic Hamiltonian Dynamics
For Bayesian inference we often want Hamiltonian Langevin dynamics.
The friction term breaks exact volume preservation but the Gibbs measure is still preserved. Discretization uses a stochastic splitting.
The Hamiltonian part stays exactly symplectic and the Langevin part is an OU process and gets solved exactly. This hybrid is the workhorse behind large-scale posterior sampling in deep learning.
Geodesic Integration
When the metric depends on the state as it does in general relativity and robotics the kinetic energy becomes and the Hamiltonian is no longer separable. Standard Leapfrog breaks.
The options are implicit methods or geodesic splitting and the latter means decomposing the metric when it has symmetry or iterating a fixed-point equation for . The trajectory then follows the curvature instead of flying off the tangent plane.
Reversibility and Debugging
Symplectic integrators are time-reversible. Negate and run steps and you should land back at up to floating-point noise.
In large physics simulations people use integer arithmetic to make this exactly reversible at the bit level. This lets you do reversible debugging and rewind from a crash to find the exact moment things went singular without storing checkpoints. The bit-level symmetry is just taken seriously.
The Symplectic Group
The symplectic matrices form a Lie group and iff .
Three properties worth noting. First so volume and orientation are preserved. Second the eigenvalues come in quartets and if is an eigenvalue then so are and and . This spectral symmetry stops eigenvalues from drifting into the unstable region without first colliding pairwise at the unit circle. And third the dimension is .
Geometric integration at its core is about keeping the numerical update on this group or near it despite the discretization error.
Multi-symplectic Methods for PDEs
Hamiltonian PDEs like the wave equation and Schrodinger often have multi-symplectic structure with symplectic forms in both time and space.
Standard symplectic integrators only handle the temporal part. Multi-symplectic integrators like the Preissman box scheme hold a local energy-momentum conservation law per spacetime cell and kill the unphysical spatial oscillations that otherwise mess up the wave-packet shape over long propagation.
Optimal Transport
Hamiltonian flows connect to optimal transport in a way I think deserves more attention. Brenier’s theorem says the optimal map pushing to is the gradient of a convex function. In the Benamou-Brenier dynamic picture the optimal paths are geodesics in the Wasserstein space of probability measures.
Integrating these Wasserstein flows geometrically means holding onto displacement convexity of the energy. Symplectic discretizations of the OT problem keep the transport maps from leaking probability mass into regions where it should not go.
Why Automatic Differentiation Matters Here
Leapfrog needs . Finite differences like bring in errors that break the symplectic structure. Even a small gradient error fires off as a non-conservative force and wipes out the shadow Hamiltonian’s stability guarantees.
Automatic differentiation in JAX and PyTorch computes gradients to machine precision and keeps the numerical vector field gradient-like by construction. This is arguably the single biggest practical advance that makes geometric integration work for high-dimensional models.
Historical notes
Hamiltonian mechanics goes back to William Rowan Hamilton’s formulation in 1833. The first explicit symplectic integrator which is Leapfrog or Stormer-Verlet showed up in De Vogelaere’s 1956 work and Loup Verlet independently rediscovered it for molecular dynamics in 1967 which gave it broad practical impact. Hybrid Monte Carlo came along in 1987 from Duane et al. and tied symplectic integration to Bayesian sampling. The next year Ge and Marsden proved their No-Go theorem which says no fixed-step integrator can be both symplectic and energy-conserving for non-integrable systems. The definitive textbook treatment came in 2006 with Hairer and Lubich and Wanner’s “Geometric Numerical Integration.” More recently Hoffman and Gelman introduced the No-U-Turn Sampler in 2014 and Greydanus et al. proposed Hamiltonian Neural Networks in 2019 and pulled geometric integration into deep learning.
Definitions
- Cotangent bundle (): the space of all possible positions and momenta for a mechanical system. Each point specifies a complete state.
- Holonomic constraint: a restriction expressed purely in terms of coordinates, e.g. . RATTLE enforces these within the symplectic framework.
- Poisson bracket (): the Lie algebra operation on phase-space functions, defined as . Conservation laws correspond to functions whose Poisson bracket with vanishes.
- Shadow Hamiltonian (): the exact conserved quantity of the discrete map. It differs from by for leapfrog.
- Symplectic 2-form (): the geometric object measuring oriented area in phase space. A map preserving is symplectic.
- Symplectic manifold: a smooth manifold equipped with a closed, non-degenerate 2-form. Phase space is the prototypical example.
References
1. Hairer, E., Lubich, C., & Wanner, G. (2006). “Geometric Numerical Integration”. The definitive reference for structure-preserving algorithms.
2. Leimkuhler, B., & Reich, S. (2004). “Simulating Hamiltonian Dynamics”. Bridge between molecular dynamics and numerical analysis.
3. Neal, R. M. (2011). “MCMC using Hamiltonian dynamics”. The foundational HMC paper for the machine learning community.
4. Marsden, J. E., & West, M. (2001). “Discrete mechanics and variational integrators”. The seminal paper on deriving integrators from the discrete action principle.
5. Munthe-Kaas, H. (1998). “High order Runge-Kutta methods on manifolds”. Introduced the RKMK methods for Lie group integration.
6. Bridges, T. J., & Reich, S. (2001). “Multi-symplectic integrators: numerical methods for Hamiltonian PDEs”. The primary source for multi-symplectic geometry in wave systems.
7. Ambrosio, L., Gigli, N., & Savare, G. (2008). “Gradient Flows in Metric Spaces”. Connection between Wasserstein geometry and optimal transport.