Martingale Theory

A martingale is often described intuitively as a “fair game”. Rigorously, it is a sequence of random variables where the conditional expectation of the next value, given all prior information, is equal to the present value. This simple conservation property (E[Xn+1Fn]=Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n) yields some of the most powerful convergence results in probability theory. It is the mathematical foundation of:

  1. Mathematical Finance: The Efficient Market Hypothesis and Option Pricing (Fundamental Theorem of Asset Pricing).
  2. Sequential Analysis: Deciding when to stop a clinical trial or A/B test (Wald’s Identity).
  3. Modern Statistics: Proving concentration bounds for complex randomized algorithms (Azuma-Hoeffding).

This covers the measure-theoretic foundations of discrete-time martingales, proving the key convergence theorems and exploring applications in finance and computer science.

1. Definitions & Measure Theoretic Setup

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space.

1.1 Information and Filtrations

Randomness evolves over time. We model the accumulation of information using a Filtration. Definition (Filtration): A filtration is an increasing sequence of sub-σ\sigma-algebras {Fn}n0\{\mathcal{F}_n\}_{n \ge 0} such that:

F0F1FF\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \dots \subseteq \mathcal{F}_\infty \subseteq \mathcal{F}

Intuitively, Fn\mathcal{F}_n represents “all information available at time nn”.

  • Natural Filtration: Fn=σ(X0,,Xn)\mathcal{F}_n = \sigma(X_0, \dots, X_n). The history of the process itself.
  • Dyadic Filtration: On Ω=[0,1)\Omega = [0, 1), let Fn\mathcal{F}_n be generated by intervals [k2n,(k+1)2n)[k 2^{-n}, (k+1) 2^{-n}). As nn \to \infty, we know the point xx with infinite precision (binary expansion).

Definition (Adapted Process): A sequence X=(Xn)n0X = (X_n)_{n \ge 0} is adapted to Fn\mathcal{F}_n if XnX_n is Fn\mathcal{F}_n-measurable for all nn. (You know the value of XnX_n at time nn).

1.2 Martingales, Submartingales, Supermartingales

Definition: An adapted sequence XnX_n is a martingale with respect to {Fn}\{\mathcal{F}_n\} if:

  1. Integrability: E[Xn]<\mathbb{E}[|X_n|] < \infty for all nn.
  2. Martingale Property: E[Xn+1Fn]=Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n almost surely.

If we replace the equality with an inequality:

  • Submartingale (\ge): E[Xn+1Fn]Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] \ge X_n. Represents a favorable game (your expected wealth increases). Also models convex functions of martingales (Jensen’s inequality).
  • Supermartingale (\le): E[Xn+1Fn]Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] \le X_n. Represents an unfavorable game (casino).

1.3 Canonical Examples

  1. Simple Symmetric Random Walk: Sn=i=1nξiS_n = \sum_{i=1}^n \xi_i where ξi=±1\xi_i = \pm 1 with probability 1/21/2. E[Sn+1Fn]=Sn+E[ξn+1]=Sn\mathbb{E}[S_{n+1} | \mathcal{F}_n] = S_n + \mathbb{E}[\xi_{n+1}] = S_n. (Martingale).
  2. Biased Random Walk: If P(ξi=1)=p>1/2P(\xi_i = 1) = p > 1/2, then E[Sn+1Fn]=Sn+(2p1)>Sn\mathbb{E}[S_{n+1} | \mathcal{F}_n] = S_n + (2p-1) > S_n. (Submartingale). However, the geometric process Mn=((1p)/p)SnM_n = ((1-p)/p)^{S_n} is a Martingale (De Moivre’s Martingale).
  3. Polya’s Urn: An urn has rr red and gg green balls. Pick one, return it plus another of the same color. Let XnX_n be the fraction of red balls at time nn. XnX_n is a Martingale. Since XnX_n is bounded [0,1][0, 1], it converges. Remarkably, the limit XX_\infty is not deterministic but follows a Beta distribution.
  4. Likelihood Ratios: Let PP and QQ be probability measures. Let Ln=dQFndPFnL_n = \frac{dQ|_{\mathcal{F}_n}}{dP|_{\mathcal{F}_n}} be the Radon-Nikodym derivative restricted to the first nn observations. Under measure PP, LnL_n is a nonnegative Martingale.

2. Doob’s Decomposition

Every submartingale (favorable game) can be uniquely decomposed into a purely random martingale part (fluctuations) and a predictable increasing part (drift). This is the discrete-time analogue of the Doob-Meyer Decomposition.

Theorem (Doob Decomposition).

Let (Xn)(X_n) be a submartingale. Then there exists a unique decomposition Xn=Mn+AnX_n = M_n + A_n where:

  1. (Mn)(M_n) is a martingale.
  2. (An)(A_n) is a predictable increasing process (AnA_{n} is Fn1\mathcal{F}_{n-1} measurable), with A0=0A_0 = 0.

Proof: Define ΔAn=E[XnXn1Fn1]\Delta A_n = \mathbb{E}[X_n - X_{n-1} | \mathcal{F}_{n-1}]. Since XX is a submartingale, ΔAn0\Delta A_n \ge 0. Let An=k=1nΔAkA_n = \sum_{k=1}^n \Delta A_k. Clearly AnA_n is predictable and increasing. Define Mn=XnAnM_n = X_n - A_n. Then E[MnMn1Fn1]=0\mathbb{E}[M_n - M_{n-1} | \mathcal{F}_{n-1}] = 0.

2.5 Doob’s Maximal Inequalities

How largely can a martingale fluctuate? Doob’s inequality bounds the tail probability of the running maximum. Let Mn=max0knXkM_n^* = \max_{0 \le k \le n} |X_k|. Theorem (Doob’s Submartingale Inequality): If XX is a non-negative submartingale, then for any λ>0\lambda > 0:

P(Mnλ)E[Xn]λP(M_n^* \ge \lambda) \le \frac{\mathbb{E}[X_n]}{\lambda}

Proof (Sketch): Let E={Mnλ}E = \{M_n^* \ge \lambda\}. Partition EE based on the first time the process crosses λ\lambda: Ek={Xkλ,j<k,Xj<λ}E_k = \{X_k \ge \lambda, \forall j < k, X_j < \lambda\}. Since XX is a submartingale, E[Xn1Ek]E[Xk1Ek]λP(Ek)\mathbb{E}[X_n \mathbb{1}_{E_k}] \ge \mathbb{E}[X_k \mathbb{1}_{E_k}] \ge \lambda P(E_k). Summing over kk yields the result.

Corollary (LpL^p Inequality): If XnX_n is a martingale and XnLpX_n \in L^p for p>1p > 1, then:

Mnppp1Xnp\|M_n^*\|_p \le \frac{p}{p-1} \|X_n\|_p

This says the maximum is controlled by the endpoint. It allows extending LpL^p convergence from the endpoint to the uniform convergence of the whole path.

3. Stopping Times & Optional Stopping

A random variable T:ΩN{}T: \Omega \to \mathbb{N} \cup \{\infty\} is a stopping time if {Tn}Fn\{T \le n\} \in \mathcal{F}_n for all nn. The decision to stop depends only on the present and past, not the future.

  • Example: “Stop when Sn10S_n \ge 10” is a stopping time.
  • Example: “Stop at the peak value” is NOT a stopping time (requires future knowledge).

Do martingales maintain their “fairness” if simply stopped? Yes. Let XnT=XTnX_n^T = X_{T \land n} be the stopped process. XTX^T is always a martingale if XX is. However, does E[XT]=E[X0]\mathbb{E}[X_T] = \mathbb{E}[X_0]? Counter-example (Doubling Strategy): Let SnS_n be a symmetric random walk starting at 0. Let T=inf{n:Sn=1}T = \inf \{n : S_n = 1\}. T<T < \infty almost surely. Then ST=1S_T = 1, so E[ST]=1E[S0]=0\mathbb{E}[S_T] = 1 \ne \mathbb{E}[S_0] = 0. The problem is that SnS_n can become arbitrarily negative before hitting 1 (lack of integrability).

Theorem (Doob's Optional Stopping Theorem).

If XnX_n is a martingale and TT is a stopping time, then E[XT]=E[X0]\mathbb{E}[X_T] = \mathbb{E}[X_0] if any of the following hold:

  1. TT is bounded (TNT \le N a.s.).
  2. XX is uniformly bounded (XnK|X_n| \le K a.s.).
  3. XX is Uniformly Integrable and T<T < \infty a.s.

3.5 Uniform Integrability (UI)

The “correct” condition for convergence of expectations (E[Xn]E[X]\mathbb{E}[X_n] \to \mathbb{E}[X]) is Uniform Integrability. Definition: A family {Xi}iI\{X_i\}_{i \in I} is Uniformly Integrable if:

limKsupiIE[Xi1Xi>K]=0\lim_{K \to \infty} \sup_{i \in I} \mathbb{E}[|X_i| \mathbb{1}_{|X_i| > K}] = 0

This rules out “mass escaping to infinity”. For the doubling strategy, the mass escapes: at any time nn, there is a tiny probability of a huge loss. Theorem: XnXX_n \to X in L1L^1     \iff XnXX_n \to X in Probability AND {Xn}\{X_n\} is UI. Since Martingales represent “conditional expectations”, and conditional expectation is an L1L^1 contraction, UI is the natural setting for Martingale convergence. Doob’s UI Theorem: If XnX_n is a martingale, the following are equivalent:

  1. {Xn}\{X_n\} is UI.
  2. XnX_n converges in L1L^1 to a limit XX_\infty.
  3. Xn=E[XFn]X_n = \mathbb{E}[X_\infty | \mathcal{F}_n] (It is a “closed” martingale).

4. Martingale Convergence Theorems

When does a martingale converge to a limit?

Theorem (Doob's Martingale Convergence Theorem).

If (Xn)(X_n) is a supermartingale bounded in L1L^1 (i.e., supnEXn<\sup_n \mathbb{E}|X_n| < \infty), then XnX_n converges almost surely to a random variable XX_\infty with EX<\mathbb{E}|X_\infty| < \infty.

Key Tool: The Upcrossing Inequality. This is the heart of the convergence proof. Let UN[a,b]U_N[a, b] be the number of times the process crosses from below aa to above bb by time NN. We define “crossing times” recursively:

  • T1=min{n:Xna}T_1 = \min \{n : X_n \le a\}
  • T2=min{n>T1:Xnb}T_2 = \min \{n > T_1 : X_n \ge b\}
  • T3=min{n>T2:Xna}T_3 = \min \{n > T_2 : X_n \le a\} And so on. UNU_N counts the completed up-trips. Doob’s Upcrossing Inequality:
(ba)E[UN[a,b]]E[(XNa)](b - a) \mathbb{E}[U_N[a, b]] \le \mathbb{E}[(X_N - a)^-]

Proof Intuition: Think of a gambling strategy: “Buy low (at aa), Sell high (at bb)”. Every completed upcrossing nets you profit bab-a. Since distinct intervals of upcrossings don’t overlap, and you are betting on a Supermartingale (unfavorable game), your total expected profit is bounded. The RHS measures the “risk” or potential loss at the end. Since EXN<\mathbb{E}|X_N| < \infty, the number of crossings is finite a.s. Thus, XnX_n cannot oscillate strictly between any rational numbers a<ba < b forever. It must converge.

4.5 Application: The Secretary Problem (Optimal Stopping)

A classic application of Martingale theory (Snell Envelopes) is the Secretary Problem. Setup: You interview NN candidates sequentially. You can rank them relative to those effectively seen. After each interview, you must Hire or Reject. Rejection is final. Goal: Maximize the probability of selecting the absolute best candidate. Strategy: Reject the first k1k-1 candidates, then hire the first one better than all previous. Let Mn=P(WinStop at n)M_n = P(\text{Win} | \text{Stop at } n). We construct the Snell Envelope (the smallest supermartingale dominating the payoff process). Using backward induction, one can show the optimal stopping rule is to skip N/eN/e candidates. The success probability approaches 1/e37%1/e \approx 37\%.

5. Concentration Inequalities

One of the most modern applications of martingales is proving that random variables concentrate sharply around their means.

Theorem (Azuma-Hoeffding Inequality).

Let (Xn)(X_n) be a martingale with bounded differences XkXk1ck|X_k - X_{k-1}| \le c_k almost surely. Then for any t>0t > 0:

P(XnX0t)2exp(t22k=1nck2)P(|X_n - X_0| \ge t) \le 2 \exp \left( - \frac{t^2}{2 \sum_{k=1}^n c_k^2} \right)

5.1 Proof of Azuma-Hoeffding

The proof relies on the Chernoff Bound technique.

  1. Exponential Markov: For any λ>0\lambda > 0: P(XnX0t)=P(eλ(XnX0)eλt)eλtE[eλ(XnX0)]P(X_n - X_0 \ge t) = P(e^{\lambda(X_n - X_0)} \ge e^{\lambda t}) \le e^{-\lambda t} \mathbb{E}[e^{\lambda(X_n - X_0)}]
  2. Telescoping: Write XnX0=k=1nDkX_n - X_0 = \sum_{k=1}^n D_k, where Dk=XkXk1D_k = X_k - X_{k-1}. E[eλDk]=E[eλk=1n1DkE[eλDnFn1]]\mathbb{E}[e^{\lambda \sum D_k}] = \mathbb{E} [ e^{\lambda \sum_{k=1}^{n-1} D_k} \mathbb{E}[e^{\lambda D_n} | \mathcal{F}_{n-1}] ]
  3. Hoeffding’s Lemma: Since Dncn|D_n| \le c_n and E[DnFn1]=0\mathbb{E}[D_n | \mathcal{F}_{n-1}] = 0, the conditional moment generating function is bounded by that of a Rademacher variable: E[eλDnFn1]eλ2cn28\mathbb{E}[e^{\lambda D_n} | \mathcal{F}_{n-1}] \le e^{\frac{\lambda^2 c_n^2}{8}}
  4. Iteration: Applying this nn times: E[eλ(XnX0)]k=1neλ2ck28=eλ28ck2\mathbb{E}[e^{\lambda(X_n - X_0)}] \le \prod_{k=1}^n e^{\frac{\lambda^2 c_k^2}{8}} = e^{\frac{\lambda^2}{8} \sum c_k^2}
  5. Optimization: Pick λ=4tck2\lambda = \frac{4t}{\sum c_k^2} to minimize the bound: Pexp(λt+λ28ck2)=exp(2t2ck2)P \le \exp \left( - \lambda t + \frac{\lambda^2}{8} \sum c_k^2 \right) = \exp \left( - \frac{2t^2}{\sum c_k^2} \right)

5.2 The Doob Martingale & McDiarmid’s Inequality

How do we use this for general functions? Let Z1,,ZnZ_1, \dots, Z_n be independent random variables. Let f(Z)f(Z) be a function we want to analyze. Construct the Doob Martingale:

Mk=E[f(Z)Z1,,Zk]M_k = \mathbb{E}[f(Z) | Z_1, \dots, Z_k]
  • M0=E[f(Z)]M_0 = \mathbb{E}[f(Z)] (The mean).
  • Mn=f(Z)M_n = f(Z) (The random variable itself). The difference Dk=MkMk1D_k = M_k - M_{k-1} represents the “information revealed” by ZkZ_k. If changing one input ziz_i changes ff by at most cic_i (Bounded Difference Property), then Dkck|D_k| \le c_k.

Theorem (McDiarmid’s Inequality): For any function ff satisfying the bounded difference property with constants c1,,cnc_1, \dots, c_n:

P(f(Z)E[f(Z)]t)2exp(2t2ci2)P(|f(Z) - \mathbb{E}[f(Z)]| \ge t) \le 2 \exp \left( - \frac{2t^2}{\sum c_i^2} \right)

Application:

  • Chromatic Number of Random Graphs: changing one edge affects the chromatic number by at most 1. Thus, χ(Gn,p)\chi(G_{n,p}) is extremely concentrated around its mean.
  • Machine Learning: The generalization error of a classifier concentrates because changing one training example has bounded effect (Stability).

6. Applications in Finance

The most famous application of Martingales is in option pricing. A market consists of risky assets (stocks) and a risk-free asset (bond). First Fundamental Theorem of Asset Pricing: The market has no arbitrage (free lunch) if and only if there exists a Risk-Neutral Measure Q\mathbb{Q} equivalent to the physical measure P\mathbb{P} such that the discounted stock price is a Q\mathbb{Q}-martingale.

Sn=EQ[erSn+1Fn]S_n = \mathbb{E}_{\mathbb{Q}}[e^{-r} S_{n+1} | \mathcal{F}_n]

Under Q\mathbb{Q}, the expected return of the stock is exactly the risk-free rate rr. The real-world drift μ\mu disappears.

6.1 Girsanov Theorem (Change of Measure)

How do we move from P\mathbb{P} to Q\mathbb{Q}? We re-weight the probabilities. Let Zn=dQdPFnZ_n = \frac{d\mathbb{Q}}{d\mathbb{P}}|_{\mathcal{F}_n} be the Radon-Nikodym derivative process. ZnZ_n is a positive martingale with E[Zn]=1\mathbb{E}[Z_n] = 1. Girsanov Theorem: If MnM_n is a martingale under P\mathbb{P}, then M~n=Mn1Zk1E[ΔMkΔZkFk1]\tilde{M}_n = M_n - \sum \frac{1}{Z_{k-1}} \mathbb{E}[\Delta M_k \Delta Z_k | \mathcal{F}_{k-1}] is a martingale under Q\mathbb{Q}. This allows us to “drift removal”: turning a biased random walk (stock with drift) into a symmetric random walk (martingale) by changing the probability of up/down moves. This is the essence of the Black-Scholes Formula.

6.2 Deriving Black-Scholes (Heuristic)

Discrete Binomial Model \to Continuous Geometric Brownian Motion. Let Δt0\Delta t \to 0. The stock price evolves as:

dSt=μStdt+σStdWtdS_t = \mu S_t dt + \sigma S_t dW_t

where WtW_t is a Brownian Motion (limit of Random Walk). Under the physical measure P\mathbb{P}, the drift is μ\mu. We choose a Risk-Neutral measure Q\mathbb{Q} such that the drift becomes the risk-free rate rr.

dSt=rStdt+σStdW~tdS_t = r S_t dt + \sigma S_t d\tilde{W}_t

The option price V(S,t)V(S, t) is the discounted expected payoff under Q\mathbb{Q}:

V(St,t)=er(Tt)EQ[max(STK,0)Ft]V(S_t, t) = e^{-r(T-t)} \mathbb{E}_{\mathbb{Q}} [\max(S_T - K, 0) | \mathcal{F}_t]

Solving this expectation yields the Black-Scholes PDE:

Vt+12σ2S22VS2+rSVSrV=0\frac{\partial V}{\partial t} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + r S \frac{\partial V}{\partial S} - rV = 0

Remarkably, μ\mu (the stock’s expected return) does not appear! This is because we can perfectly hedge the risk.

7. Randomized Algorithms & Simulation

Martingales analyze the runtime of randomized algorithms. Consider a random walk on a graph G=(V,E)G=(V, E). Let TT be the hitting time of a vertex set. We construct martingales Mn=ϕ(Xn)nM_n = \phi(X_n) - n to estimate E[T]\mathbb{E}[T].

7.1 Simulation: Wald’s Identity

Let XiX_i be i.i.d. with mean μ\mu. Let Sn=XiS_n = \sum X_i. Let TT be a stopping time with E[T]<\mathbb{E}[T] < \infty. Wald’s Identity: E[ST]=μE[T]\mathbb{E}[S_T] = \mu \mathbb{E}[T].

Let’s simulate a Biased Random Walk stopping at thresholds A-A or +B+B. We verify that Mn=SnnμM_n = S_n - n\mu is a martingale and checks Wald’s identity.

import numpy as np def simulate_wald(p=0.55, A=10, B=10, trials=1000): """ Biased random walk X_i = +/- 1. P(X=1) = p. Drift mu = 2p - 1. Stop when S_n reaches -A or +B. """ mu = 2*p - 1 stopping_times = [] final_values = [] for _ in range(trials): S = 0 t = 0 while -A < S < B: step = 1 if np.random.rand() < p else -1 S += step t += 1 stopping_times.append(t) final_values.append(S) E_T = np.mean(stopping_times) E_S_T = np.mean(final_values) print(f"p={p}, Drift mu={mu:.2f}") print(f"Simulated E[T]: {E_T:.2f}") print(f"Simulated E[S_T]: {E_S_T:.2f}") print(f"Wald Check (mu * E[T]): {mu * E_T:.2f}") # Analytical Formula for Gambler's Ruin # P(Reach B) = (1 - (q/p)^A) / (1 - (q/p)^(A+B)) q = 1-p if p != 0.5: ratio = q/p prob_B = (1 - ratio**A) / (1 - ratio**(A+B)) analytical_E_S = B * prob_B + (-A) * (1 - prob_B) print(f"Analytical E[S_T]: {analytical_E_S:.2f}") # simulate_wald(p=0.55, A=10, B=10)

The simulation confirms that E[ST]=E[T]μ\mathbb{E}[S_T] = \mathbb{E}[T] \cdot \mu, provided the stopping time satisfies the conditions (bounded expectation).

8. Conclusion

We have traversed the landscape of Martingale theory, from the foundational definitions of Doob to the modern applications in High-Dimensional Probability and Finance. The power of martingales lies in their universality:

  1. Convergence: If you can’t beat the house (Supermartingale), your wealth converges.
  2. Concentration: If no single step dictates the outcome (Bounded Differences), the sum concentrates Gaussianly.
  3. Pricing: If there is no arbitrage, there is a Martingale measure.

Whether analyzing the runtime of QuickSort or pricing a barrier option, the “fair game” property allows us to extract precise information from random noise.


9. Polya’s Urn Scheme

A beautiful example of Martingale convergence is Polya’s Urn. Setup: Start with 1 Red and 1 Green ball. At each step nn, pick a ball uniformly at random. Return it to the urn along with another ball of the same color. Let XnX_n be the fraction of Red balls at time nn. Initially X0=1/2X_0 = 1/2. Claim: XnX_n is a martingale. Let Rn,GnR_n, G_n be the number of red/green balls. Total balls Tn=n+2T_n = n+2.

P(Rn+1=Rn+1Fn)=RnTn=XnP(R_{n+1} = R_n + 1 | \mathcal{F}_n) = \frac{R_n}{T_n} = X_n E[Xn+1Fn]=XnRn+1Tn+1+(1Xn)RnTn+1=Rn(Rn+1+Gn)(Rn+Gn)(Tn+1)=RnTn=Xn\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n \frac{R_n+1}{T_n+1} + (1-X_n) \frac{R_n}{T_n+1} = \frac{R_n(R_n+1 + G_n)}{(R_n+G_n)(T_n+1)} = \frac{R_n}{T_n} = X_n

Convergence: Since XnX_n is bounded in [0,1][0, 1], by the Martingale Convergence Theorem, XnXX_n \to X_\infty almost surely. Limit Distribution: What is XX_\infty? It is NOT 1/21/2. The rich get richer. One can show by induction that RnR_n is uniformly distributed on {1,,n+1}\{1, \dots, n+1\}. Thus, XX_\infty is uniformly distributed on [0,1][0, 1] (Beta(1, 1)). If we started with aa red and bb green, the limit would be Beta(a, b). This is a standard model for “path dependence” in economics and reinforcement learning.


Historical Timeline

  • 1934: Paul Lévy introduces the concept of Martingales (though not the name) in his work on sums of dependent variables.
  • 1939: Jean Ville coins the term “Martingale” (referencing a gambling strategy) and proves the first inequality.
  • 1953: Joseph Doob publishes “Stochastic Processes”, formalizing the theory (Decomposition, Convergence, Optional Stopping).
  • 1965: Azuma & Hoeffding independently prove the concentration inequality for bounded difference martingales.
  • 1973: Black & Scholes apply martingale measures (Risk-Neutral Pricing) to option valuation.
  • 1990s: McDiarmid extends concentration to general functions, revolutionizing Learning Theory.

Appendix A: The Radon-Nikodym Derivative

We often define conditional expectation E[XG]\mathbb{E}[X | \mathcal{G}] as the unique G\mathcal{G}-measurable random variable ZZ such that:

AZdP=AXdP,AG\int_A Z dP = \int_A X dP, \quad \forall A \in \mathcal{G}

This existence relies on the Radon-Nikodym Theorem. Let μ\mu and ν\nu be two measures on (Ω,F)(\Omega, \mathcal{F}). ν\nu is absolutely continuous with respect to μ\mu (νμ\nu \ll \mu) if μ(A)=0    ν(A)=0\mu(A) = 0 \implies \nu(A) = 0. Theorem: If νμ\nu \ll \mu and both are σ\sigma-finite, there exists a unique measurable function f=dνdμf = \frac{d\nu}{d\mu} such that:

ν(A)=Afdμ\nu(A) = \int_A f d\mu

Martingale Connection: Let (Ω,F,Q)(\Omega, \mathcal{F}, \mathbb{Q}) and (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}) be probability measures. The likelihood ratio process Ln=dQFndPFnL_n = \frac{d\mathbb{Q}|_{\mathcal{F}_n}}{d\mathbb{P}|_{\mathcal{F}_n}} is always a non-negative P\mathbb{P}-martingale. If LnLL_n \to L_\infty with E[L]=1\mathbb{E}[L_\infty] = 1, then QP\mathbb{Q} \ll \mathbb{P} on F\mathcal{F}_\infty. If Ln0L_n \to 0, then Q\mathbb{Q} and P\mathbb{P} are singular (mutually exclusive) in the limit (Kakutani’s Dichotomy).


Appendix B: Backward Martingales

What happens if time goes backwards? Let {Gn}n0\{\mathcal{G}_{-n}\}_{n \ge 0} be a decreasing sequence of σ\sigma-algebras: G0G1G=Gn\mathcal{G}_0 \supseteq \mathcal{G}_{-1} \supseteq \dots \supseteq \mathcal{G}_{-\infty} = \cap \mathcal{G}_{-n}. A sequence XnX_{-n} is a backward martingale if E[XnGn1]=Xn1\mathbb{E}[X_{-n} | \mathcal{G}_{-n-1}] = X_{-n-1} (or index it XnX_n adapted to decreasing Gn\mathcal{G}_n). Theorem: Every backward martingale converges almost surely and in L1L^1:

XnX=E[X0G]X_{-n} \to X_{-\infty} = \mathbb{E}[X_0 | \mathcal{G}_{-\infty}]

Application: Strong Law of Large Numbers (SLLN). Let Sn=X1++XnS_n = X_1 + \dots + X_n where XiX_i are i.i.d. Let Gn=σ(Sn,Sn+1,)\mathcal{G}_{-n} = \sigma(S_n, S_{n+1}, \dots) (Symmetric information). One can show that E[X1Gn]=Sn/n\mathbb{E}[X_1 | \mathcal{G}_{-n}] = S_n / n. Since this is a backward martingale, Sn/nS_n/n converges a.s. to E[X1Tail]\mathbb{E}[X_1 | \text{Tail}]. By Kolmogorov’s 0-1 Law, the Tail algebra is trivial, so the limit is constant μ\mu.


Appendix C: Proof of Optional Stopping Theorem

The Optional Stopping Theorem is subtle. Here we prove the bounded case. Theorem: If MnM_n is a martingale and TKT \le K is a bounded stopping time, then E[MT]=E[M0]\mathbb{E}[M_T] = \mathbb{E}[M_0]. Proof: We can write the stopped value as a sum of increments:

MT=M0+n=1K(MnMn1)1nTM_T = M_0 + \sum_{n=1}^K (M_n - M_{n-1}) \mathbb{1}_{n \le T}

The event {nT}\{n \le T\} is the complement of {T<n}={Tn1}\{T < n\} = \{T \le n-1\}. Since {Tn1}Fn1\{T \le n-1\} \in \mathcal{F}_{n-1}, its complement {nT}\{n \le T\} is also in Fn1\mathcal{F}_{n-1}. This means the process Hn=1nTH_n = \mathbb{1}_{n \le T} is predictable. Now consider the expectation:

E[MTM0]=E[n=1K(MnMn1)Hn]\mathbb{E}[M_T - M_0] = \mathbb{E} \left[ \sum_{n=1}^K (M_n - M_{n-1}) H_n \right]

By linearity of expectation (since sum is finite KK):

=n=1KE[(MnMn1)Hn]= \sum_{n=1}^K \mathbb{E} [ (M_n - M_{n-1}) H_n ]

Use the tower property of conditional expectation:

E[(MnMn1)Hn]=E[E[(MnMn1)HnFn1]]\mathbb{E} [ (M_n - M_{n-1}) H_n ] = \mathbb{E} [ \mathbb{E} [ (M_n - M_{n-1}) H_n | \mathcal{F}_{n-1} ] ]

Since HnH_n is Fn1\mathcal{F}_{n-1}-measurable (“known at time n1n-1”):

=E[HnE[MnMn1Fn1]]= \mathbb{E} [ H_n \mathbb{E} [ M_n - M_{n-1} | \mathcal{F}_{n-1} ] ]

By definition of martingale, E[MnMn1Fn1]=0\mathbb{E} [ M_n - M_{n-1} | \mathcal{F}_{n-1} ] = 0. Thus, every term in the sum is 0. E[MT]=E[M0]\mathbb{E}[M_T] = \mathbb{E}[M_0]. \square Extension to UI Case: If T<T < \infty a.s. but unbounded, we approximate TT by TKT \land K. E[MTK]=E[M0]\mathbb{E}[M_{T \land K}] = \mathbb{E}[M_0]. We need to take the limit KK \to \infty.

limKE[MTK]=E[MT]\lim_{K \to \infty} \mathbb{E}[M_{T \land K}] = \mathbb{E}[M_T]

This exchange of limit and expectation requires Uniform Integrability of the family {MTK}K\{M_{T \land K}\}_K. If the martingale is UI, this holds.



Appendix D: The Galton-Watson Branching Process

Another canonical martingale arises in population dynamics. Setup: Start with 1 individual (Z0=1Z_0 = 1). At each step, every individual ii in generation nn produces Xn,iX_{n,i} offspring, i.i.d. with mean μ=E[X]\mu = \mathbb{E}[X]. The population size evolves as:

Zn+1=j=1ZnXn,jZ_{n+1} = \sum_{j=1}^{Z_n} X_{n,j}

Claim: Wn=Zn/μnW_n = Z_n / \mu^n is a nonnegative martingale. Proof:

E[Zn+1Fn]=E[j=1ZnXn,jZn]=ZnE[X]=Znμ\mathbb{E}[Z_{n+1} | \mathcal{F}_n] = \mathbb{E} \left[ \sum_{j=1}^{Z_n} X_{n,j} \Bigg| Z_n \right] = Z_n \mathbb{E}[X] = Z_n \mu

Dividing by μn+1\mu^{n+1}:

E[Zn+1μn+1Fn]=Znμμn+1=Znμn=Wn\mathbb{E} \left[ \frac{Z_{n+1}}{\mu^{n+1}} \Big| \mathcal{F}_n \right] = \frac{Z_n \mu}{\mu^{n+1}} = \frac{Z_n}{\mu^n} = W_n

Convergence: Since Wn0W_n \ge 0, it converges a.s. to a limit WW_\infty. Interpretation:

  • If μ1\mu \le 1 (Subcritical/Critical), the population goes extinct (Zn0Z_n \to 0) almost surely (unless P(X=1)=1P(X=1)=1). W=0W_\infty = 0.
  • If μ>1\mu > 1 (Supercritical), there is a positive probability of survival. On the survival set, ZnZ_n grows geometrically like μnW\mu^n \cdot W_\infty. The random variable WW_\infty describes the “random magnitude” of the explosion.

Appendix E: Glossary of Terms

  • Adapted Process: A process XnX_n where the value at time nn is known given information Fn\mathcal{F}_n.
  • Filtration: An increasing stream of information σ\sigma-algebras.
  • Martingale: A process where the best prediction of the future is the present value.
  • Predictable Process: A process AnA_n whose value at time nn is known at time n1n-1.
  • Stopping Time: A random time TT where the decision to stop depends only on past and present.
  • Submartingale: A process that tends to increase (favorable game).
  • Uniform Integrability (UI): A condition preventing probabilities from “escaping to infinity”, ensuring L1L^1 convergence.
  • Upcrossing: The event of a process crossing from below a level aa to above a level bb.

References

1. Williams, D. (1991). “Probability with Martingales”. The gold standard for discrete-time martingales. Famous for its clear intuition (“The method of the gambling team”).

2. Durrett, R. (2019). “Probability: Theory and Examples”. The standard graduate text. Covers measure theory rigor.

3. Chung, K. L. (1974). “A Course in Probability Theory”. Classic text with deep insights into convergence theorems.

4. Steele, J. M. (2004). “The Cauchy-Schwarz Master Class”. Excellent for inequalities (Azuma, etc.).