Geometric Integration & Hamiltonian Monte Carlo

4/10/2025

Standard numerical integrators (Euler, Runge-Kutta) are designed to minimize local truncation error in the Euclidean sense. They treat the state space as a flat vector space $\mathbb{R}^{2n}$ and ignore the global geometric invariants of the system. For Hamiltonian systems (energy conserving), standard integrators exhibit “energy drift”: a monotonic increase or decrease in the system’s total energy, eventually leading to catastrophic divergence. In Statistical Mechanics and Bayesian Inference (HMC), this drift is unacceptable as it violates the Detailed Balance condition required for valid sampling. We require Geometric Numerical Integration: methods that preserve the underlying structure (holonomy, symplecticity, momentum) by design. This note explores the mathematical engine of these methods, starting from the geometry of the cotangent bundle to the stability of shadow Hamiltonians.

1. Hamiltonian Foundations: The Symplectic Manifold

A Hamiltonian system is defined on a Symplectic Manifold $(\mathcal{M}, \omega)$ . $\mathcal{M}$ is typically the cotangent bundle $T^*Q$ with coordinates $(q, p)$ . $\omega$ is a closed, non-degenerate 2-form $\omega = \sum dq_i \wedge dp_i$ .

The Vector Field: Given a Hamiltonian $H: \mathcal{M} \to \mathbb{R}$ , the associated vector field $X_H$ is defined by:

\omega(X_H, \cdot) = dH(\cdot)

In local coordinates $(q, p)$ , this yields the Hamilton equations:

\dot{q} = \frac{\partial H}{\partial p}, \quad \dot{p} = -\frac{\partial H}{\partial q}

Or in matrix form $\dot{z} = J \nabla H(z)$ where $J$ is the standard symplectic matrix $\begin{pmatrix} 0 & I \\ -I & 0 \end{pmatrix}$ .

Poisson Brackets: The symplectic structure induces a Lie algebra on the space of smooth functions $C^\infty(\mathcal{M})$ via the Poisson Bracket:

\{f, g\} = \omega(X_f, X_g) = \sum \left( \frac{\partial f}{\partial q_i} \frac{\partial g}{\partial p_i} - \frac{\partial f}{\partial p_i} \frac{\partial g}{\partial q_i} \right)

A flow $\Psi_t$ is Symplectic if it preserves the 2-form: $\Psi_t^* \omega = \omega$ . This is equivalent to the Jacobian condition $(\nabla \Psi_t)^T J (\nabla \Psi_t) = J$ . Every Hamiltonian flow is symplectic. By Liouville’s theorem, it also preserves the volume form $\Omega = \omega^n / n!$ .

2. Symplectic Integrators: Splitting and Composition

We cannot solve the flow $\Psi_t$ exactly for general $H(q, p) = U(q) + K(p)$ . Instead, we decompose the Hamiltonian into integrable parts $H = A + B$ . If $A$ and $B$ can be solved exactly, their flows $\Phi_A$ and $\Phi_B$ are symplectic. Composition of symplectic maps remains symplectic.

The Leapfrog (Verlet) Scheme: Defined by the symmetric splitting:

\Psi_h = \Phi_B(h/2) \circ \Phi_A(h) \circ \Phi_B(h/2)

Typically, $B$ is the potential ( $U(q)$ ) update and $A$ is the kinetic ( $K(p)$ ) update. Because each sub-map is a shear transform (a triangular matrix with unit diagonal), the Jacobian determinant is 1. Symplecticity is guaranteed by construction, regardless of the step size $h$ , provided the system remains within the stability region.

3. Lie Group Integrators: Munthe-Kaas Methods

Standard integrators perform the update $z_{n+1} = z_n + h \dot{z}_n$ . This works in Euclidean space because $\mathbb{R}^n$ is closed under addition. However, if the state space is a Lie Group $G$ (like the rotation group $SO(3)$ ), the updated point $z_n + h \dot{z}_n$ will generally not lie in $G$ . We need integrators that respect the group structure.

The Exponential Map: Let $\mathfrak{g}$ be the Lie algebra of $G$ . The exponential map $\exp: \mathfrak{g} \to G$ maps elements of the tangent space at the identity to the group. The update rule becomes:

z_{n+1} = z_n \cdot \exp(h \cdot \sigma_n)

where $\sigma_n \in \mathfrak{g}$ is the “lifted” direction. Munthe-Kaas (RKMK) methods generalize Runge-Kutta to this setting by solving the differential equation in the Lie algebra and then mapping back. This ensures that a robot arm’s joints stay at unit length and rigid bodies do not “stretch” during simulation.

4. Variational Integrators: The Lagrangian Path

While Hamiltonian mechanics focuses on phase space, Lagrangian mechanics focuses on the path that minimizes the Action Integral:

S(q) = \int_{0}^T L(q, \dot{q}) dt

Discrete Variational Principle: A Variational Integrator is derived by discretizing the action first:

S_d(q_0, \dots, q_k) = \sum_{k=0}^{n-1} L_d(q_k, q_{k+1}, h)

where $L_d$ is the discrete Lagrangian. Hamilton’s Principle then leads to the Discrete Euler-Lagrange (DEL) equations:

D_2 L_d(q_{k-1}, q_k, h) + D_1 L_d(q_k, q_{k+1}, h) = 0

The Major Result: Every variational integrator is Symplectic. By discretizing the variational principle rather than the differential equation, we inherit the conservation of the discrete symplectic form $\omega_L$ automatically. This provides an alternative route to constructing high-order integrators for orbital mechanics and celestial simulation.

5. Backward Error Analysis & Shadow Hamiltonians

What equation does Leapfrog actually solve? We know it solves $H$ approximately, but there exists a modified Hamiltonian $\tilde{H}$ such that the numerical points $(q_k, p_k)$ lie exactly on the flow lines of $\tilde{H}$ . Let $e^{h L_A}$ be the Lie operator for the flow of $H_A$ . The Leapfrog operator is:

\Psi_h = e^{h/2 L_B} e^{h L_A} e^{h/2 L_B}

Baker-Campbell-Hausdorff (BCH) Formula: The composition $\Psi_h$ can be written as a single exponential $e^{h L_{\tilde{H}}}$ . For symmetric splitting, the error terms are always even powers of $h$ :

\tilde{H} = H + h^2 \frac{1}{24} \{ 2\{H_B, H_A\}, H_A \} + \{ \{H_A, H_B\}, H_B \} \} + O(h^4)

Assuming a separable $H(q, p) = \frac{1}{2}p^2 + U(q)$ , this simplifies to:

\tilde{H} = H(q, p) + \frac{h^2}{24} \left( \|\nabla U\|^2 - 2 p^T \nabla^2 U p \right) + O(h^4)

Linear Stability Limits: $\tilde{H}$ is an infinite series that generally diverges. However, for small $h$ , it can be truncated. Stability is lost when the eigenvalues of the Jacobian $S$ move away from the unit circle. For a harmonic oscillator with frequency $\omega$ , the stability condition for Leapfrog is:

h \omega < 2

If $h \omega > 2$ , the shadow Hamiltonian ceases to exist, and the numerical system enters a chaotic, explosive regime.

6. No-Go Theorems (Ge-Marsden)

Can we construct a symplectic integrator that conserves energy exactly? Theorem (Ge-Marsden 1988): If a fixed time-step integrator is:

Symplectic
Energy preserving ( $H(z_{k+1}) = H(z_k)$ )
Non-integrable system Then the integrator must be the identity map (time step 0). Meaning: You cannot have both exact geometry and exact energy for general systems. Integrated Hamiltonian dynamics stays “near” energy contours precisely because of symplecticity, making the preservation of volume more statistically robust than pinning energy itself.

7. HMC Implementation

We use Leapfrog in HMC. Steps:

Sample momentum $p \sim \mathcal{N}(0, M)$ .
Simulate Hamiltonian dynamics for $L$ steps using Leapfrog.
Metropolis correction: $\alpha = \min(1, \exp(H_{old} - H_{new}))$ . If Leapfrog conserved $H$ perfectly, $\alpha=1$ . Since it conserves $\tilde{H}$ , error depends on $\epsilon^2$ .

We use Leapfrog and RK4 to simulate the Kepler Problem (an inverse-square central force). The Hamiltonian is $H = \frac{1}{2}\|p\|^2 - \frac{1}{\|q\|}$ . Symplectic methods maintain a stable (though precessing) elliptical orbit. Non-symplectic methods (like RK4) slowly drain or add energy, causing the planet to spiral into the sun or escape the system.


import numpy as np
import matplotlib.pyplot as plt
 
def kepler_potential(q):
    return -1.0 / np.linalg.norm(q)
 
def kepler_grad(q):
    return q / np.linalg.norm(q)**3
 
def leapfrog_kepler(steps=2000, h=0.01):
    q = np.array([1.0, 0.0])
    p = np.array([0.0, 0.5]) # Low velocity for elliptical orbit
    path = []
    energy = []
    
    for _ in range(steps):
        p -= (h/2) * kepler_grad(q)
        q += h * p
        p -= (h/2) * kepler_grad(q)
        
        path.append(q.copy())
        energy.append(0.5*np.dot(p, p) + kepler_potential(q))
        
    return np.array(path), np.array(energy)
 
def rk4_kepler(steps=2000, h=0.01):
    z = np.array([1.0, 0.0, 0.0, 0.5]) # q1, q2, p1, p2
    path = []
    energy = []
    
    def f(state):
        q = state[:2]
        p = state[2:]
        return np.concatenate([p, -kepler_grad(q)])
    
    for _ in range(steps):
        k1 = f(z)
        k2 = f(z + h/2 * k1)
        k3 = f(z + h/2 * k2)
        k4 = f(z + h * k3)
        z += (h/6) * (k1 + 2*k2 + 2*k3 + k4)
        
        path.append(z[:2].copy())
        p = z[2:]
        energy.append(0.5*np.dot(p, p) + kepler_potential(z[:2]))
        
    return np.array(path), np.array(energy)
 
def plot_kepler_results():
    path_lf, e_lf = leapfrog_kepler()
    path_rk, e_rk = rk4_kepler()
    
    plt.figure(figsize=(12, 5))
    
    # Orbits
    plt.subplot(1, 2, 1)
    plt.plot(path_lf[:,0], path_lf[:,1], label='Leapfrog (Symplectic)')
    plt.plot(path_rk[:,0], path_rk[:,1], 'r--', label='RK4 (Energy Drift)')
    plt.scatter([0], [0], color='orange', s=100, label='Sun')
    plt.title('Kepler Orbits (2000 steps)')
    plt.legend()
    
    # Energy error
    plt.subplot(1, 2, 2)
    plt.plot(e_lf - e_lf[0], label='Leapfrog Error')
    plt.plot(e_rk - e_rk[0], label='RK4 Error')
    plt.title('Energy Fluctuation vs Drift')
    plt.ylabel('H - H0')
    plt.legend()
    
    # plt.show()

8. Constrained Dynamics: The RATTLE Algorithm

Often, we simulate systems with holonomic constraints $g(q) = 0$ (e.g., a molecule with fixed bond lengths). A standard symplectic update will violate these constraints within a few steps. We must modify the Hamiltonian to include Lagrange multipliers $\lambda$ :

\dot{p} = -\nabla U(q) - \lambda \nabla g(q)

RATTLE is the symplectic discretization of constrained Hamiltonian systems. It involves solving for $\lambda$ such that both $g(q_{n+1}) = 0$ and $\nabla g \cdot v_{n+1} = 0$ . The preservation of the Symplectic Form on the Constrained Manifold is non-trivial but guaranteed by RATTLE’s specific staggering of the multiplier updates.

9. The Adaptive Time-Step Paradox

In general numerics, we use adaptive $h$ to speed up simulation in smooth regions. However, Adaptive Symplectic Integration is impossible with naive step size changes. If $h$ depends on state $z$ , the resulting map $\Psi_{h(z)}$ is generally NOT symplectic. Energy drift reappears as soon as $h$ begins to vary. Shadow Energy $h(z)$ creates a non-Hamiltonian feedback loop. To solve this, we use the Poincaré Transformation: Instead of changing $h$ , we transform the time variable $dt = g(q) d\tau$ and solve a modified Hamiltonian $\mathcal{H} = g(q)(H(q, p) - E_0) = 0$ . This allows the use of a fixed step size in $\tau$ while achieving variable resolution in $t$ , preserving the geometric invariants of the system.

10. Neural Hamiltonian Flows

In generative modeling, we want to map a simple distribution (Gaussian) to a complex one (images) using a reversible, volume-preserving map. Hamiltonian Neural Networks (HNN) and Symplectic Adjoint Networks parametrize the Hamiltonian $H_\theta(q, p)$ as a neural network. The flow $\Psi_h$ is then implemented using a Leapfrog integrator layer. Because Leapfrog is exactly symplectic, the network is:

Guaranteed Reversible: We can reconstruct the latent noise from the image perfectly.
Volume Preserving: The Jacobian determinant is 1, simplifying the Change of Variables formula in Normalizing Flows.
Physics-Informed: The network learns the underlying conserved quantities (energy) of the dynamical system it is modeling.

11. Conclusion: The Edge of Stability

The shadow Hamiltonian $\tilde{H}$ is an asymptotic series. For infinitely small $h$ , it perfectly describes the numerical flow. However, in practical simulation, we use the largest $h$ possible. As $h$ increases, the error terms grow. At a critical threshold (the resonance limit), the series for $\tilde{H}$ diverges. The discrete map $\Psi_h$ undergoes a Stochastic Transition: the orbits leave the invariant KAM tori and begin to fill the phase space volume. This is the “chaos” regime where HMC becomes inefficient, as the proposal distribution no longer respects the geometry of the target density. The art of geometric integration is thus a balancing act: operating at the very edge of the shadow Hamiltonian’s convergence to maximize sampling speed without falling into the abyss of non-conservative drift.

Historical Timeline

Year	Event	Significance
1833	William Rowan Hamilton	Formulates Hamiltonian mechanics.
1956	De Vogelaere	First explicit symplectic integrator (Leapfrog).
1967	Loup Verlet	Rediscoveres Leapfrog for Molecular Dynamics.
1987	Duane et al.	Hybrid Monte Carlo (HMC).
1988	Ge & Marsden	No-Go Theorem for exact energy conservation.
2006	Hairer, Lubich, Wanner	”Geometric Numerical Integration” textbook.
2014	Hoffman & Gelman	No-U-Turn Sampler (NUTS).
2019	Greydanus et al.	Hamiltonian Neural Networks.

Appendix A: Proof of Liouville’s Theorem (Trace-Determinant)

Theorem: If $\dot{z} = f(z)$ and $\text{div } f = 0$ , then volume is conserved. Proof: Let $V(t)$ be the volume of a set $\Omega_t$ transported by flow.

V(t) = \int_{\Omega_t} dz = \int_{\Omega_0} \det \nabla \Psi_t(z_0) dz_0

Let $J(t) = \det \nabla \Psi_t$ . Jacobi’s Formula: $\frac{d}{dt} \det A(t) = \det A(t) \text{Tr}(A^{-1} \dot{A})$ . Here $A = \nabla \Psi_t$ . $\frac{d}{dt} \nabla \Psi_t = \nabla \frac{d}{dt} \Psi_t = \nabla f(\Psi_t) = (\nabla f) (\nabla \Psi_t)$ . (Chain rule). So $\dot{A} = (\nabla f) A$ . $A^{-1} \dot{A} = \nabla f$ . $\text{Tr}(A^{-1} \dot{A}) = \text{Tr}(\nabla f) = \text{div } f$ . So $\frac{d}{dt} J(t) = J(t) \text{div } f$ . For Hamiltonian systems, $f = (H_p, -H_q)$ . $\text{div } f = \frac{\partial^2 H}{\partial q \partial p} - \frac{\partial^2 H}{\partial p \partial q} = 0$ . Thus $\dot{J} = 0$ . Since $J(0) = \det I = 1$ , $J(t) = 1$ forever. $\square$

Appendix B: Symplectic Euler

Leapfrog is Order 2. There is a simpler Order 1 symplectic method. Symplectic Euler (Semi-Implicit):

$p_{n+1} = p_n - \epsilon \nabla U(q_n)$
$q_{n+1} = q_n + \epsilon p_{n+1}$ (Use NEW momentum). Jacobian:

\begin{pmatrix} I & \epsilon I \\ 0 & I \end{pmatrix} \begin{pmatrix} I & 0 \\ -\epsilon H_{qq} & I \end{pmatrix} = \begin{pmatrix} I - \epsilon^2 H_{qq} & \epsilon I \\ -\epsilon H_{qq} & I \end{pmatrix}

Determinant: $(1 - \epsilon^2 H_{qq}) - (-\epsilon^2 H_{qq}) = 1$ . It is symplectic! But it is not symmetric (Time Reversible). Symmetrization of Symplectic Euler gives Leapfrog. $\Psi_{Leap} = \Phi_{Euler}^* \circ \Phi_{Euler}$ .

Appendix C: KAM Theory and Long-Term Stability

One of the deepest questions in numerical analysis is: How long does the shadow Hamiltonian stability last? The Kolmogorov-Arnold-Moser (KAM) theorem states that for small perturbations of integrable systems, most invariant tori (periodic orbits) survive. Symplectic integrators act as small perturbations to the Hamiltonian $H$ . Backward error analysis shows that the discrete trajectory is $O(h^p)$ close to an integrable system $\tilde{H}$ for times $T \propto \exp(1/h)$ . This means that while the integrator is not solving the exact system, it is solving a nearby stable reality for astronomical timescales. This is why symplectic integrators are used to calculate the stability of the Solar System over billions of years; a non-symplectic method would have crumbled in a few millenia.

Appendix D: Exterior Calculus and Symplectic Forms

Symplecticity is fundamentally an area-preserving property in the language of Exterior Calculus. Let $\omega = \sum dq_i \wedge dp_i$ be the symplectic 2-form. The integral of $\omega$ over a surface $S$ in phase space gives the total oriented area of $S$ . A map $\Psi$ is symplectic if the pulled-back form $\Psi^* \omega$ equals $\omega$ . Applying Stokes’ Theorem:

\int_{\Psi(S)} \omega = \int_S \Psi^* \omega = \int_S \omega

This geometric interpretation reveals that HMC isn’t just “preserving energy”; it is preserving the very “fabric” of the state space. This lack of compression/expansion in phase space is what prevents the Markov Chain from becoming biased toward high-density regions without proper accounting.

Appendix E: Generating Functions and Hamilton-Jacobi Theory

Every symplectic map $\Psi$ can be locally represented by a Generating Function $S(q_0, q_1)$ . The relations $p_0 = -\frac{\partial S}{\partial q_0}$ and $p_1 = \frac{\partial S}{\partial q_1}$ implicitly define the map. This is the bridge between the Hamiltonian and Lagrangian perspectives. If $S$ satisfies the Hamilton-Jacobi Equation:

\frac{\partial S}{\partial t} + H(q, \nabla S, t) = 0

then $S$ defines the exact flow of the system. Symplectic integrators can be viewed as maps derived from an approximate generating function $S_d \approx S$ . This ensures that even though $S_d$ is not exact, the resulting discretization satisfies the consistency conditions required by the symplectic group.

Appendix F: Stochastic Hamiltonian Dynamics

In Bayesian inference, we often consider the Stochastic Differential Equation (SDE):

dq = p dt, \quad dp = -\nabla U(q) dt - \gamma p dt + \sqrt{2 \gamma T} dW_t

This is Hamiltonian Langevin Dynamics. The geometric properties change slightly: we no longer preserve volume exactly (due to the $-\gamma p$ friction term), but we preserve the Gibbs Measure $e^{-H/T}$ . Correctly discretizing this system requires a Stochastic Symplectic Integrator. Splitting methods remain the gold standard:

\Psi_{total} = \Phi_{Lang}(\Delta t / 2) \circ \Phi_{Ham}(\Delta t) \circ \Phi_{Lang}(\Delta t / 2)

This ensures that the “Hamiltonian part” is perfectly symplectic, while the “Langevin part” (OU process) is solved exactly. This hybrid approach is the backbone of modern large-scale posterior sampling in deep learning.

Appendix G: Geodesic Integration on Riemannian Manifolds

In General Relativity and Robotics, we often integrate on manifolds where the metric $g_{ij}(q)$ is state-dependent. The kinetic energy becomes $K(q, p) = \frac{1}{2} p^T g^{-1}(q) p$ . The Hamiltonian is no longer simple/separable. Standard Leapfrog fails. Geodesic Leapfrog: We must use Implicit methods or Geodesic Splitting. Specifically, we can split the metric itself (if it has symmetry) or solve the fixed-point equation for $p_{n+1/2}$ iteratively. The result is a trajectory that “rolls” along the curvature of the space without flying off the tangent plane. This is essential for simulating the precession of planets around black holes where the “straight lines” are themselves curved.

Appendix H: Symplectic Symmetry and Reversible Debugging

Symplectic integrators are Time Reversible. If you negate the momentum ( $p \to -p$ ) and run the integrator for $N$ steps, you should return exactly to the initial $q_0$ (modulo floating point error). Bit-Reversible Computing: In large-scale physics simulations, we use Integer Arithmetic or fixed-point representations to make the map exactly reversible even at the bit level. This allows for Reversible Debugging: you can “rewind” a simulation from a crash state to find the exact moment a singularity occurred without storing the entire checkpoint history. This bit-level symmetry is a direct digital encoding of the symplectic property $\Psi_{-h} \circ \Psi_h = I$ .

Appendix I: The Symplectic Group $Sp(2n, \mathbb{R})$

The set of all symplectic matrices forms a Lie group $Sp(2n, \mathbb{R})$ . A matrix $M$ is in the group if $M^T J M = J$ . Properties:

Determinant: $\det M = 1$ . (Orientation and volume preservation).
Spectrum: If $\lambda$ is an eigenvalue, then $1/\lambda$ , $\bar{\lambda}$ , and $1/\bar{\lambda}$ are also eigenvalues. This “symplectic symmetry” in the spectrum is what prevents complex eigenvalues from drifting into the unstable region without a pair-wise collision at the unit circle.
Dimension: The dimension of the group is $n(2n+1)$ . Geometric integration can be viewed as the task of ensuring that our numerical update restricted to the manifold $Sp(2n, \mathbb{R})$ despite the errors of discretization.

Appendix J: Multi-symplectic Integrators for PDEs

Can we extend these ideas to infinite-dimensional systems like the Wave Equation or Schrödinger Equation? Hamiltonian PDEs often have a “Multi-symplectic” structure, where there is a symplectic form for both time and space.

K \partial_t z + L \partial_x z = \nabla H(z)

Standard symplectic integrators only preserve the temporal structure. Multi-symplectic integrators (like the Preissman box scheme) preserve a local energy-momentum conservation law for every cell in the spacetime grid. This prevents the accumulation of unphysical spatial oscillations (aliasing) and ensures that wave packets preserve their shape over long distance travel.

Appendix K: Connection to Optimal Transport

There is a profound link between Hamiltonian flows and Optimal Transport (OT). Brenier’s Theorem states that the optimal map $T$ pushing one distribution $\mu$ to another $\nu$ is the gradient of a convex function $\phi$ . In the dynamic formulation of OT (Benamou-Brenier), the optimal path is a geodesic in the space of probability measures endowed with the Wasserstein metric. Geometric integration of these “Wasserstein Flows” requires preserving the displacement convexity of the energy. Symplectic integrators applied to the discretized OT problem ensure that the resulting transport maps are physically consistent and do not “leak” probability mass into forbidden regions of the state space.

Appendix L: Numerical Jacobians and Automatic Differentiation

Implementing symplectic integrators like Leapfrog or RATTLE requires the gradient $\nabla U$ . In modern machine learning frameworks (JAX, PyTorch), we use Automatic Differentiation (AD) to compute these gradients exactly to machine precision. Using Finite Differences (e.g., $(f(x+h) - f(x))/h$ ) introduces $O(h^2)$ errors that break the symplectic symmetry. Even a tiny gradient error acts as a non-conservative force, destroying the long-term stability proofs of the shadow Hamiltonian. AD ensuring that our numerical vector field is grad-like by construction is the single most important development in making Geometric Integration practical for complex, high-dimensional models.

Appendix M: Glossary of Geometric Terms

Cotangent Bundle ( $T^*Q$ ): The space of all possible positions and momenta.
Holonomic Constraint: A coordinate-based restriction (e.g., $q_1^2 + q_2^2 = 1$ ).
Poisson Bracket ( $\{f, g\}$ ): The Lie algebra operation on state-space functions.
Shadow Hamiltonian ( $\tilde{H}$ ): The exact energy of the discrete system.
Symplectic 2-form ( $\omega$ ): The geometric object that measures phase space area.
Symplectic Manifold: A smooth manifold equipped with a closed non-degenerate 2-form.

References

1. Hairer, E., Lubich, C., & Wanner, G. (2006). “Geometric Numerical Integration”. The definitive reference for structure-preserving algorithms.

2. Leimkuhler, B., & Reich, S. (2004). “Simulating Hamiltonian Dynamics”. Bridge between molecular dynamics and numerical analysis.

3. Neal, R. M. (2011). “MCMC using Hamiltonian dynamics”. The foundational HMC paper for the machine learning community.

4. Marsden, J. E., & West, M. (2001). “Discrete mechanics and variational integrators”. The seminal paper on deriving integrators from the discrete action principle.

5. Munthe-Kaas, H. (1998). “High order Runge-Kutta methods on manifolds”. Introduced the RKMK methods for Lie group integration.

6. Bridges, T. J., & Reich, S. (2001). “Multi-symplectic integrators: numerical methods for Hamiltonian PDEs”. The primary source for multi-symplectic geometry in wave systems.

7. Ambrosio, L., Gigli, N., & Savaré, G. (2008). “Gradient Flows in Metric Spaces”. Deep connection between Wasserstein geometry and optimal transport.