Martingale Theory

A martingale is a sequence of random variables where the conditional expectation of the next value given all the past information comes out equal to the present value. This conservation property E[Xn+1Fn]=Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n hands you strong convergence results and sits at the foundation of a few big areas.

  1. Mathematical finance and things like the Efficient Market Hypothesis and option pricing.
  2. Sequential analysis like deciding when to stop a clinical trial or an A/B test with Wald’s Identity.
  3. Modern statistics and especially concentration bounds for randomized algorithms with Azuma-Hoeffding.

What follows is a measure-theoretic walk through discrete-time martingales and covers convergence theorems and applications in finance and computer science.

1. Definitions and Measure-Theoretic Setup

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space.

1.1 Information and Filtrations

We model how information piles up over time with a filtration. Definition (Filtration). A filtration is an increasing sequence of sub-σ\sigma-algebras {Fn}n0\{\mathcal{F}_n\}_{n \ge 0} with

F0F1FF\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \dots \subseteq \mathcal{F}_\infty \subseteq \mathcal{F}

and here Fn\mathcal{F}_n stands in for the information you have at time nn.

  • Natural filtration. Fn=σ(X0,,Xn)\mathcal{F}_n = \sigma(X_0, \dots, X_n) and it is the σ\sigma-algebra spun up by the history of the process itself.
  • Dyadic filtration. On Ω=[0,1)\Omega = [0, 1) let Fn\mathcal{F}_n come from the intervals [k2n,(k+1)2n)[k 2^{-n}, (k+1) 2^{-n}), and as nn \to \infty the point xx gets nailed down with infinite precision through its binary expansion.

Definition (Adapted Process). A sequence X=(Xn)n0X = (X_n)_{n \ge 0} is adapted to Fn\mathcal{F}_n when XnX_n is Fn\mathcal{F}_n-measurable for every nn, and this means the value of XnX_n is set by the information you have at time nn.

1.2 Martingales, Submartingales, Supermartingales

Definition. An adapted and integrable sequence XnX_n is a martingale with respect to {Fn}\{\mathcal{F}_n\} when

  1. Integrability. E[Xn]<\mathbb{E}[|X_n|] < \infty for all nn.
  2. Martingale property. E[Xn+1Fn]=Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n almost surely.

Swapping the equality for an inequality gives two cousins.

  • Submartingale (\ge). E[Xn+1Fn]Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] \ge X_n and the process tends to drift up, and this also covers convex functions of martingales by Jensen’s inequality.
  • Supermartingale (\le). E[Xn+1Fn]Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] \le X_n and the process tends to drift down, and this is the standard model of an unfavorable game.

1.3 Canonical Examples

  1. Simple Symmetric Random Walk. Sn=i=1nξiS_n = \sum_{i=1}^n \xi_i where ξi=±1\xi_i = \pm 1 with probability 1/21/2 each. E[Sn+1Fn]=Sn+E[ξn+1]=Sn\mathbb{E}[S_{n+1} | \mathcal{F}_n] = S_n + \mathbb{E}[\xi_{n+1}] = S_n and this is a martingale.
  2. Biased Random Walk. When P(ξi=1)=p>1/2P(\xi_i = 1) = p > 1/2 you get E[Sn+1Fn]=Sn+(2p1)>Sn\mathbb{E}[S_{n+1} | \mathcal{F}_n] = S_n + (2p-1) > S_n and this is a submartingale. But the geometric process Mn=((1p)/p)SnM_n = ((1-p)/p)^{S_n} is a martingale and this is De Moivre’s martingale.
  3. Polya’s Urn. An urn holds rr red and gg green balls and you pick one and return it along with another of the same color. Let XnX_n be the fraction of red balls at time nn and then XnX_n is a martingale. Because XnX_n is bounded in [0,1][0, 1] it converges almost surely, and the limit XX_\infty follows a Beta distribution and does not pile up at a point.
  4. Likelihood Ratios. Let PP and QQ be probability measures and let Ln=dQFndPFnL_n = \frac{dQ|_{\mathcal{F}_n}}{dP|_{\mathcal{F}_n}} be the Radon-Nikodym derivative cut down to the first nn observations. Under measure PP the LnL_n is a nonnegative martingale.

2. Doob’s Decomposition

Every submartingale splits uniquely into a martingale piece that holds the fluctuations and a predictable increasing piece that holds the drift. This is the discrete-time version of the Doob-Meyer Decomposition.

Theorem (Doob Decomposition).

Let (Xn)(X_n) be a submartingale and there is a unique decomposition Xn=Mn+AnX_n = M_n + A_n where

  1. (Mn)(M_n) is a martingale.
  2. (An)(A_n) is a predictable increasing process (AnA_{n} is Fn1\mathcal{F}_{n-1} measurable), with A0=0A_0 = 0.

Proof. Set ΔAn=E[XnXn1Fn1]\Delta A_n = \mathbb{E}[X_n - X_{n-1} | \mathcal{F}_{n-1}] and because XX is a submartingale you have ΔAn0\Delta A_n \ge 0. Let An=k=1nΔAkA_n = \sum_{k=1}^n \Delta A_k and then AnA_n is predictable and increasing. Set Mn=XnAnM_n = X_n - A_n and then E[MnMn1Fn1]=0\mathbb{E}[M_n - M_{n-1} | \mathcal{F}_{n-1}] = 0.

2.5 Doob’s Maximal Inequalities

The next inequality bounds the fluctuations of a submartingale by its endpoint. Let Mn=max0knXkM_n^* = \max_{0 \le k \le n} |X_k|. Theorem (Doob’s Submartingale Inequality). When XX is a non-negative submartingale then for any λ>0\lambda > 0

P(Mnλ)E[Xn]λP(M_n^* \ge \lambda) \le \frac{\mathbb{E}[X_n]}{\lambda}

Proof (Sketch). Let E={Mnλ}E = \{M_n^* \ge \lambda\} and split EE by the first time the process crosses λ\lambda so Ek={Xkλ,j<k,Xj<λ}E_k = \{X_k \ge \lambda, \forall j < k, X_j < \lambda\}. Because XX is a submartingale you get E[Xn1Ek]E[Xk1Ek]λP(Ek)\mathbb{E}[X_n \mathbb{1}_{E_k}] \ge \mathbb{E}[X_k \mathbb{1}_{E_k}] \ge \lambda P(E_k), and summing over kk hands you the result.

Corollary (LpL^p Inequality). When XnX_n is a martingale and XnLpX_n \in L^p for p>1p > 1 then

Mnppp1Xnp\|M_n^*\|_p \le \frac{p}{p-1} \|X_n\|_p

And so the maximum of the process is bounded by the endpoint in LpL^p, and LpL^p convergence of the endpoint steps up to uniform convergence of the whole path.

3. Stopping Times and Optional Stopping

A random variable T:ΩN{}T: \Omega \to \mathbb{N} \cup \{\infty\} is a stopping time when {Tn}Fn\{T \le n\} \in \mathcal{F}_n for all nn. The decision to stop can only lean on the present and past information.

  • Example. “Stop when Sn10S_n \ge 10” is a stopping time.
  • Non-example. “Stop at the time the process hits its maximum” needs knowledge of the future and is not a stopping time.

The stopped process XnT=XTnX_n^T = X_{T \land n} hangs onto the martingale property and so when XX is a martingale so is XTX^T. The deeper question is whether E[XT]=E[X0]\mathbb{E}[X_T] = \mathbb{E}[X_0].

Counter-example (Doubling Strategy). Let SnS_n be a symmetric random walk starting at 0 and let T=inf{n:Sn=1}T = \inf \{n : S_n = 1\}, and then T<T < \infty almost surely and ST=1S_T = 1 and so E[ST]=1E[S0]=0\mathbb{E}[S_T] = 1 \ne \mathbb{E}[S_0] = 0. The equality breaks because SnS_n can drop arbitrarily negative before hitting 1 and this breaks the integrability conditions for optional stopping.

Theorem (Doob's Optional Stopping Theorem).

When XnX_n is a martingale and TT is a stopping time then E[XT]=E[X0]\mathbb{E}[X_T] = \mathbb{E}[X_0] when any of the following hold.

  1. TT is bounded (TNT \le N a.s.).
  2. XX is uniformly bounded (XnK|X_n| \le K a.s.).
  3. XX is uniformly integrable and T<T < \infty a.s.

3.5 Uniform Integrability (UI)

Uniform integrability is the condition that makes expectations converge so that E[Xn]E[X]\mathbb{E}[X_n] \to \mathbb{E}[X]. Definition. A family {Xi}iI\{X_i\}_{i \in I} is uniformly integrable when

limKsupiIE[Xi1Xi>K]=0\lim_{K \to \infty} \sup_{i \in I} \mathbb{E}[|X_i| \mathbb{1}_{|X_i| > K}] = 0

Uniform integrability stops probability mass from running off to infinity. In the doubling strategy the mass does run off, and at any finite time there is a small probability of a big loss and these contributions do not fade out in a uniform way.

Theorem. XnXX_n \to X in L1L^1 exactly when XnXX_n \to X in probability and {Xn}\{X_n\} is UI.

Because conditional expectation is an L1L^1 contraction the uniform integrability is the natural regularity condition for martingale convergence.

Doob’s UI Theorem. When XnX_n is a martingale the following are equivalent.

  1. {Xn}\{X_n\} is UI.
  2. XnX_n converges in L1L^1 to a limit XX_\infty.
  3. Xn=E[XFn]X_n = \mathbb{E}[X_\infty | \mathcal{F}_n] and the martingale is “closed”.

4. Martingale Convergence Theorems

The basic question is figuring out when a martingale runs toward a limit.

Theorem (Doob's Martingale Convergence Theorem).

When (Xn)(X_n) is a supermartingale bounded in L1L^1 so that supnEXn<\sup_n \mathbb{E}|X_n| < \infty then XnX_n converges almost surely to a random variable XX_\infty with EX<\mathbb{E}|X_\infty| < \infty.

The Upcrossing Inequality. The upcrossing inequality sits at the heart of the convergence proof. Let UN[a,b]U_N[a, b] count how many times the process crosses upward from below aa to above bb by time NN. The crossing times are set recursively.

  • T1=min{n:Xna}T_1 = \min \{n : X_n \le a\}
  • T2=min{n>T1:Xnb}T_2 = \min \{n > T_1 : X_n \ge b\}
  • T3=min{n>T2:Xna}T_3 = \min \{n > T_2 : X_n \le a\}

and so on and UNU_N counts the completed upcrossings.

Doob’s Upcrossing Inequality.

(ba)E[UN[a,b]]E[(XNa)](b - a) \mathbb{E}[U_N[a, b]] \le \mathbb{E}[(X_N - a)^-]

Proof Idea. Think of a gambling strategy that buys at level aa and sells at level bb. Each completed upcrossing hands you a profit of bab-a and the upcrossing intervals do not overlap. Because the underlying process is a supermartingale and so an unfavorable game the total expected profit is bounded above, and the right-hand side holds the worst-case loss at the final time.

Because EXN<\mathbb{E}|X_N| < \infty the expected number of upcrossings is finite. And so XnX_n cannot swing between any two rationals a<ba < b infinitely often and convergence follows.

4.5 Application: The Secretary Problem (Optimal Stopping)

A classic application of martingale theory through Snell Envelopes is the Secretary Problem. Setup. You interview NN candidates one after another and you can rank them against the ones you have already seen. After each interview you have to hire or reject and a rejection is final. Goal. Push up the probability of picking the absolute best candidate. Strategy. Reject the first k1k-1 candidates and then hire the first one better than everyone before. Let Mn=P(WinStop at n)M_n = P(\text{Win} | \text{Stop at } n). We build the Snell Envelope which is the smallest supermartingale sitting above the payoff process. By backward induction you can show the optimal stopping rule is to skip N/eN/e candidates. The success probability runs toward 1/e37%1/e \approx 37\%.

5. Concentration Inequalities

One of the biggest modern uses of martingales is pinning down sharp concentration of random variables around their means.

Theorem (Azuma-Hoeffding Inequality).

Let (Xn)(X_n) be a martingale with bounded differences XkXk1ck|X_k - X_{k-1}| \le c_k almost surely and then for any t>0t > 0

P(XnX0t)2exp(t22k=1nck2)P(|X_n - X_0| \ge t) \le 2 \exp \left( - \frac{t^2}{2 \sum_{k=1}^n c_k^2} \right)

5.1 Proof of Azuma-Hoeffding

The proof runs through the Chernoff bound trick.

  1. Exponential Markov. For any λ>0\lambda > 0 P(XnX0t)=P(eλ(XnX0)eλt)eλtE[eλ(XnX0)]P(X_n - X_0 \ge t) = P(e^{\lambda(X_n - X_0)} \ge e^{\lambda t}) \le e^{-\lambda t} \mathbb{E}[e^{\lambda(X_n - X_0)}]
  2. Telescoping. Write XnX0=k=1nDkX_n - X_0 = \sum_{k=1}^n D_k where Dk=XkXk1D_k = X_k - X_{k-1}. E[eλDk]=E[eλk=1n1DkE[eλDnFn1]]\mathbb{E}[e^{\lambda \sum D_k}] = \mathbb{E} [ e^{\lambda \sum_{k=1}^{n-1} D_k} \mathbb{E}[e^{\lambda D_n} | \mathcal{F}_{n-1}] ]
  3. Hoeffding’s Lemma. Because E[DnFn1]=0\mathbb{E}[D_n | \mathcal{F}_{n-1}] = 0 and Dn[cn,cn]D_n \in [-c_n, c_n] which is an interval of length 2cn2c_n you get E[eλDnFn1]eλ2cn22\mathbb{E}[e^{\lambda D_n} | \mathcal{F}_{n-1}] \le e^{\frac{\lambda^2 c_n^2}{2}}
  4. Iteration. Apply this nn times to get E[eλ(XnX0)]k=1neλ2ck22=eλ22ck2\mathbb{E}[e^{\lambda(X_n - X_0)}] \le \prod_{k=1}^n e^{\frac{\lambda^2 c_k^2}{2}} = e^{\frac{\lambda^2}{2} \sum c_k^2}
  5. Optimization. Pick λ=tck2\lambda = \frac{t}{\sum c_k^2} to make the bound as small as possible. Pexp(λt+λ22ck2)=exp(t22ck2)P \le \exp \left( - \lambda t + \frac{\lambda^2}{2} \sum c_k^2 \right) = \exp \left( - \frac{t^2}{2\sum c_k^2} \right)

5.2 The Doob Martingale and McDiarmid’s Inequality

To push Azuma-Hoeffding onto a general function of independent random variables we use the Doob martingale construction. Let Z1,,ZnZ_1, \dots, Z_n be independent random variables and let f(Z)f(Z) be a function you care about. The Doob martingale is

Mk=E[f(Z)Z1,,Zk]M_k = \mathbb{E}[f(Z) | Z_1, \dots, Z_k]
  • M0=E[f(Z)]M_0 = \mathbb{E}[f(Z)] is the unconditional mean.
  • Mn=f(Z)M_n = f(Z) is the realized value. The difference Dk=MkMk1D_k = M_k - M_{k-1} captures the information that ZkZ_k kicks in. When changing any one input ziz_i shifts ff by at most cic_i which is the bounded difference property then Dkck|D_k| \le c_k.

Theorem (McDiarmid’s Inequality). For any function ff that has the bounded difference property with constants c1,,cnc_1, \dots, c_n

P(f(Z)E[f(Z)]t)2exp(2t2ci2)P(|f(Z) - \mathbb{E}[f(Z)]| \ge t) \le 2 \exp \left( - \frac{2t^2}{\sum c_i^2} \right)

Application.

  • Chromatic Number of Random Graphs. Changing one edge shifts the chromatic number by at most 1 and so χ(Gn,p)\chi(G_{n,p}) packs tightly around its mean.
  • Machine Learning. The generalization error of a classifier concentrates because changing one training example has a bounded effect on the output and this is where stability-based bounds come from.

6. Applications in Finance

The main use of martingale theory in mathematical finance is option pricing. A market holds risky assets like stocks and a risk-free asset like a bond. First Fundamental Theorem of Asset Pricing. The market has no arbitrage exactly when there is a risk-neutral measure Q\mathbb{Q} equivalent to the physical measure P\mathbb{P} where the discounted stock price is a Q\mathbb{Q}-martingale.

Sn=EQ[erSn+1Fn]S_n = \mathbb{E}_{\mathbb{Q}}[e^{-r} S_{n+1} | \mathcal{F}_n]

Under Q\mathbb{Q} the expected return of the stock comes out to the risk-free rate rr and the real-world drift μ\mu drops out.

6.1 Girsanov Theorem (Change of Measure)

The move from P\mathbb{P} to Q\mathbb{Q} happens by re-weighting probabilities. Let Zn=dQdPFnZ_n = \frac{d\mathbb{Q}}{d\mathbb{P}}|_{\mathcal{F}_n} be the Radon-Nikodym derivative process. ZnZ_n is a positive martingale with E[Zn]=1\mathbb{E}[Z_n] = 1. Girsanov Theorem. When MnM_n is a martingale under P\mathbb{P} then M~n=Mn1Zk1E[ΔMkΔZkFk1]\tilde{M}_n = M_n - \sum \frac{1}{Z_{k-1}} \mathbb{E}[\Delta M_k \Delta Z_k | \mathcal{F}_{k-1}] is a martingale under Q\mathbb{Q}. This lets you strip the drift out and a biased random walk for a stock with drift turns into a martingale under the new measure when you tweak the probabilities of up and down moves. Drift removal through a change of measure sits under the Black-Scholes formula.

6.2 Deriving Black-Scholes (Heuristic)

In the continuous-time limit of the binomial model as Δt0\Delta t \to 0 the stock price runs as

dSt=μStdt+σStdWtdS_t = \mu S_t dt + \sigma S_t dW_t

where WtW_t is a Brownian Motion and it is the scaling limit of a random walk. Under the physical measure P\mathbb{P} the drift is μ\mu. Picking the risk-neutral measure Q\mathbb{Q} swaps μ\mu for the risk-free rate rr.

dSt=rStdt+σStdW~tdS_t = r S_t dt + \sigma S_t d\tilde{W}_t

The option price V(S,t)V(S, t) is the discounted expected payoff under Q\mathbb{Q}.

V(St,t)=er(Tt)EQ[max(STK,0)Ft]V(S_t, t) = e^{-r(T-t)} \mathbb{E}_{\mathbb{Q}} [\max(S_T - K, 0) | \mathcal{F}_t]

Working this expectation out gives the Black-Scholes PDE.

Vt+12σ2S22VS2+rSVSrV=0\frac{\partial V}{\partial t} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + r S \frac{\partial V}{\partial S} - rV = 0

The stock’s expected return μ\mu does not show up in this equation and this falls out of the fact that the option’s risk can be hedged away completely.

7. Randomized Algorithms and Simulation

Martingales give you a natural setup for working out the runtime of randomized algorithms. Take a random walk on a graph G=(V,E)G=(V, E) and let TT be the hitting time of a vertex set. We build martingales Mn=ϕ(Xn)nM_n = \phi(X_n) - n to estimate E[T]\mathbb{E}[T].

7.1 Simulation: Wald’s Identity

Let XiX_i be i.i.d. with mean μ\mu and let Sn=XiS_n = \sum X_i and let TT be a stopping time with E[T]<\mathbb{E}[T] < \infty. Wald’s Identity. E[ST]=μE[T]\mathbb{E}[S_T] = \mu \mathbb{E}[T].

The simulation below runs a biased random walk stopped at thresholds A-A and +B+B and checks that Mn=SnnμM_n = S_n - n\mu is a martingale and that Wald’s identity holds.

import numpy as np def simulate_wald(p=0.55, A=10, B=10, trials=1000): """ Biased random walk X_i = +/- 1. P(X=1) = p. Drift mu = 2p - 1. Stop when S_n reaches -A or +B. """ mu = 2*p - 1 stopping_times = [] final_values = [] for _ in range(trials): S = 0 t = 0 while -A < S < B: step = 1 if np.random.rand() < p else -1 S += step t += 1 stopping_times.append(t) final_values.append(S) E_T = np.mean(stopping_times) E_S_T = np.mean(final_values) print(f"p={p}, Drift mu={mu:.2f}") print(f"Simulated E[T]: {E_T:.2f}") print(f"Simulated E[S_T]: {E_S_T:.2f}") print(f"Wald Check (mu * E[T]): {mu * E_T:.2f}") # Analytical Formula for Gambler's Ruin # P(Reach B) = (1 - (q/p)^A) / (1 - (q/p)^(A+B)) q = 1-p if p != 0.5: ratio = q/p prob_B = (1 - ratio**A) / (1 - ratio**(A+B)) analytical_E_S = B * prob_B + (-A) * (1 - prob_B) print(f"Analytical E[S_T]: {analytical_E_S:.2f}") # simulate_wald(p=0.55, A=10, B=10)

The simulation backs up E[ST]=E[T]μ\mathbb{E}[S_T] = \mathbb{E}[T] \cdot \mu as long as the stopping time has finite expectation.

8. Conclusion

The power of martingales comes from how universal they are.

  1. Convergence. An L1L^1-bounded supermartingale converges almost surely.
  2. Concentration. Under bounded differences the sums show sub-Gaussian concentration.
  3. Pricing. No arbitrage is the same as having a martingale measure.

Whether you are looking at QuickSort runtime or barrier option pricing the martingale property gives a single framework for pulling precise quantitative information out of stochastic processes.


9. Polya’s Urn Scheme

Polya’s Urn gives a nice picture of martingale convergence. Setup. Start with 1 red and 1 green ball. At each step nn you pull a ball at random and then return it to the urn along with one more ball of the same color. Let XnX_n be the fraction of red balls at time nn. At the start X0=1/2X_0 = 1/2. Claim. XnX_n is a martingale. Let RnR_n and GnG_n be the number of red and green balls and the total is Tn=n+2T_n = n+2.

P(Rn+1=Rn+1Fn)=RnTn=XnP(R_{n+1} = R_n + 1 | \mathcal{F}_n) = \frac{R_n}{T_n} = X_n E[Xn+1Fn]=XnRn+1Tn+1+(1Xn)RnTn+1=Rn(Rn+1+Gn)(Rn+Gn)(Tn+1)=RnTn=Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n \frac{R_n+1}{T_n+1} + (1-X_n) \frac{R_n}{T_n+1} = \frac{R_n(R_n+1 + G_n)}{(R_n+G_n)(T_n+1)} = \frac{R_n}{T_n} = X_n

Convergence. Because XnX_n is bounded in [0,1][0, 1] the Martingale Convergence Theorem says XnXX_n \to X_\infty almost surely. Limit Distribution. The limit XX_\infty is not a constant and by induction you can check that RnR_n is uniform on {1,,n+1}\{1, \dots, n+1\}. And so XX_\infty is uniform on [0,1][0, 1] which is Beta(1, 1). More generally starting with aa red and bb green balls hands you a Beta(aa, bb) limit. The Polya urn is a standard model for path dependence in economics and reinforcement learning.


The Radon-Nikodym Derivative

Conditional expectation E[XG]\mathbb{E}[X | \mathcal{G}] is set as the unique G\mathcal{G}-measurable random variable ZZ where

AZdP=AXdP,AG\int_A Z dP = \int_A X dP, \quad \forall A \in \mathcal{G}

and the Radon-Nikodym theorem guarantees that such a ZZ exists. Let μ\mu and ν\nu be two measures on (Ω,F)(\Omega, \mathcal{F}). ν\nu is absolutely continuous with respect to μ\mu and written νμ\nu \ll \mu when μ(A)=0\mu(A) = 0 forces ν(A)=0\nu(A) = 0. Theorem. When νμ\nu \ll \mu and both are σ\sigma-finite there is a unique measurable function f=dνdμf = \frac{d\nu}{d\mu} with

ν(A)=Afdμ\nu(A) = \int_A f d\mu

Martingale connection. Let (Ω,F,Q)(\Omega, \mathcal{F}, \mathbb{Q}) and (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}) be probability measures. The likelihood ratio process Ln=dQFndPFnL_n = \frac{d\mathbb{Q}|_{\mathcal{F}_n}}{d\mathbb{P}|_{\mathcal{F}_n}} is always a non-negative P\mathbb{P}-martingale. When LnLL_n \to L_\infty with E[L]=1\mathbb{E}[L_\infty] = 1 you get QP\mathbb{Q} \ll \mathbb{P} on F\mathcal{F}_\infty. When Ln0L_n \to 0 the measures Q\mathbb{Q} and P\mathbb{P} are mutually singular in the limit and this is Kakutani’s Dichotomy.


Backward Martingales

Take a decreasing sequence of σ\sigma-algebras {Gn}n0\{\mathcal{G}_{-n}\}_{n \ge 0} where G0G1G=Gn\mathcal{G}_0 \supseteq \mathcal{G}_{-1} \supseteq \dots \supseteq \mathcal{G}_{-\infty} = \cap \mathcal{G}_{-n}. A sequence XnX_{-n} is a backward martingale when E[XnGn1]=Xn1\mathbb{E}[X_{-n} | \mathcal{G}_{-n-1}] = X_{-n-1}. Theorem. Every backward martingale converges almost surely and in L1L^1.

XnX=E[X0G]X_{-n} \to X_{-\infty} = \mathbb{E}[X_0 | \mathcal{G}_{-\infty}]

This hooks into the strong law of large numbers. Let Sn=X1++XnS_n = X_1 + \dots + X_n where the XiX_i are i.i.d. Let Gn=σ(Sn,Sn+1,)\mathcal{G}_{-n} = \sigma(S_n, S_{n+1}, \dots) which is the symmetric σ\sigma-algebra. You can check that E[X1Gn]=Sn/n\mathbb{E}[X_1 | \mathcal{G}_{-n}] = S_n / n. Because this is a backward martingale Sn/nS_n/n converges almost surely to E[X1Tail]\mathbb{E}[X_1 | \text{Tail}], and by Kolmogorov’s 0-1 Law the tail σ\sigma-algebra is trivial and so the limit is the constant μ\mu.


Proof of the Optional Stopping Theorem

We prove the Optional Stopping Theorem in the bounded case. Theorem. When MnM_n is a martingale and TKT \le K is a bounded stopping time then E[MT]=E[M0]\mathbb{E}[M_T] = \mathbb{E}[M_0]. Proof. You can write the stopped value as a sum of increments.

MT=M0+n=1K(MnMn1)1nTM_T = M_0 + \sum_{n=1}^K (M_n - M_{n-1}) \mathbb{1}_{n \le T}

The event {nT}\{n \le T\} is the complement of {T<n}={Tn1}\{T < n\} = \{T \le n-1\}. Because {Tn1}Fn1\{T \le n-1\} \in \mathcal{F}_{n-1} the complement {nT}\{n \le T\} also sits in Fn1\mathcal{F}_{n-1}. And this means the process Hn=1nTH_n = \mathbb{1}_{n \le T} is predictable. Now look at the expectation.

E[MTM0]=E[n=1K(MnMn1)Hn]\mathbb{E}[M_T - M_0] = \mathbb{E} \left[ \sum_{n=1}^K (M_n - M_{n-1}) H_n \right]

By linearity of expectation because the sum is finite you get

=n=1KE[(MnMn1)Hn]= \sum_{n=1}^K \mathbb{E} [ (M_n - M_{n-1}) H_n ]

Use the tower property of conditional expectation.

E[(MnMn1)Hn]=E[E[(MnMn1)HnFn1]]\mathbb{E} [ (M_n - M_{n-1}) H_n ] = \mathbb{E} [ \mathbb{E} [ (M_n - M_{n-1}) H_n | \mathcal{F}_{n-1} ] ]

Because HnH_n is Fn1\mathcal{F}_{n-1}-measurable

=E[HnE[MnMn1Fn1]]= \mathbb{E} [ H_n \mathbb{E} [ M_n - M_{n-1} | \mathcal{F}_{n-1} ] ]

and by the martingale property E[MnMn1Fn1]=0\mathbb{E} [ M_n - M_{n-1} | \mathcal{F}_{n-1} ] = 0. And so every term in the sum dies and you get E[MT]=E[M0]\mathbb{E}[M_T] = \mathbb{E}[M_0].


Origins

Paul Levy brought in the main ideas behind martingales in 1934 while working on sums of dependent random variables and he did not use the name. Jean Ville coined the term “martingale” in 1939 as a reference to a class of gambling strategies and proved the first supermartingale inequality. Joseph Doob’s 1953 book “Stochastic Processes” set up the modern theory and covered the decomposition theorem and the convergence theorem and optional stopping. Azuma and Hoeffding each proved the bounded-difference concentration inequality in 1965 and it turned into a workhorse in combinatorics and computer science. The link to finance came together with Black and Scholes in 1973 when risk-neutral pricing recast option valuation as a martingale expectation. McDiarmid pushed concentration out to general functions of independent variables in the 1990s and made it a standard tool in learning theory and algorithm analysis.