Martingale Theory

1/5/2026

A martingale is often described intuitively as a “fair game”. Rigorously, it is a sequence of random variables where the conditional expectation of the next value, given all prior information, is equal to the present value. This simple conservation property ( $\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n$ ) yields some of the most powerful convergence results in probability theory. It is the mathematical foundation of:

Mathematical Finance: The Efficient Market Hypothesis and Option Pricing (Fundamental Theorem of Asset Pricing).
Sequential Analysis: Deciding when to stop a clinical trial or A/B test (Wald’s Identity).
Modern Statistics: Proving concentration bounds for complex randomized algorithms (Azuma-Hoeffding).

This covers the measure-theoretic foundations of discrete-time martingales, proving the key convergence theorems and exploring applications in finance and computer science.

1. Definitions & Measure Theoretic Setup

Let $(\Omega, \mathcal{F}, P)$ be a probability space.

1.1 Information and Filtrations

Randomness evolves over time. We model the accumulation of information using a Filtration. Definition (Filtration): A filtration is an increasing sequence of sub- $\sigma$ -algebras $\{\mathcal{F}_n\}_{n \ge 0}$ such that:

\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \dots \subseteq \mathcal{F}_\infty \subseteq \mathcal{F}

Intuitively, $\mathcal{F}_n$ represents “all information available at time $n$ ”.

Natural Filtration: $\mathcal{F}_n = \sigma(X_0, \dots, X_n)$ . The history of the process itself.
Dyadic Filtration: On $\Omega = [0, 1)$ , let $\mathcal{F}_n$ be generated by intervals $[k 2^{-n}, (k+1) 2^{-n})$ . As $n \to \infty$ , we know the point $x$ with infinite precision (binary expansion).

Definition (Adapted Process): A sequence $X = (X_n)_{n \ge 0}$ is adapted to $\mathcal{F}_n$ if $X_n$ is $\mathcal{F}_n$ -measurable for all $n$ . (You know the value of $X_n$ at time $n$ ).

1.2 Martingales, Submartingales, Supermartingales

Definition: An adapted sequence $X_n$ is a martingale with respect to $\{\mathcal{F}_n\}$ if:

Integrability: $\mathbb{E}[|X_n|] < \infty$ for all $n$ .
Martingale Property: $\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n$ almost surely.

If we replace the equality with an inequality:

Submartingale ( $\ge$ ): $\mathbb{E}[X_{n+1} | \mathcal{F}_n] \ge X_n$ . Represents a favorable game (your expected wealth increases). Also models convex functions of martingales (Jensen’s inequality).
Supermartingale ( $\le$ ): $\mathbb{E}[X_{n+1} | \mathcal{F}_n] \le X_n$ . Represents an unfavorable game (casino).

1.3 Canonical Examples

Simple Symmetric Random Walk: $S_n = \sum_{i=1}^n \xi_i$ where $\xi_i = \pm 1$ with probability $1/2$ . $\mathbb{E}[S_{n+1} | \mathcal{F}_n] = S_n + \mathbb{E}[\xi_{n+1}] = S_n$ . (Martingale).
Biased Random Walk: If $P(\xi_i = 1) = p > 1/2$ , then $\mathbb{E}[S_{n+1} | \mathcal{F}_n] = S_n + (2p-1) > S_n$ . (Submartingale). However, the geometric process $M_n = ((1-p)/p)^{S_n}$ is a Martingale (De Moivre’s Martingale).
Polya’s Urn: An urn has $r$ red and $g$ green balls. Pick one, return it plus another of the same color. Let $X_n$ be the fraction of red balls at time $n$ . $X_n$ is a Martingale. Since $X_n$ is bounded $[0, 1]$ , it converges. Remarkably, the limit $X_\infty$ is not deterministic but follows a Beta distribution.
Likelihood Ratios: Let $P$ and $Q$ be probability measures. Let $L_n = \frac{dQ|_{\mathcal{F}_n}}{dP|_{\mathcal{F}_n}}$ be the Radon-Nikodym derivative restricted to the first $n$ observations. Under measure $P$ , $L_n$ is a nonnegative Martingale.

2. Doob’s Decomposition

Every submartingale (favorable game) can be uniquely decomposed into a purely random martingale part (fluctuations) and a predictable increasing part (drift). This is the discrete-time analogue of the Doob-Meyer Decomposition.

Theorem (Doob Decomposition).

Let $(X_n)$ be a submartingale. Then there exists a unique decomposition $X_n = M_n + A_n$ where:

$(M_n)$ is a martingale.
$(A_n)$ is a predictable increasing process ( $A_{n}$ is $\mathcal{F}_{n-1}$ measurable), with $A_0 = 0$ .

Proof: Define $\Delta A_n = \mathbb{E}[X_n - X_{n-1} | \mathcal{F}_{n-1}]$ . Since $X$ is a submartingale, $\Delta A_n \ge 0$ . Let $A_n = \sum_{k=1}^n \Delta A_k$ . Clearly $A_n$ is predictable and increasing. Define $M_n = X_n - A_n$ . Then $\mathbb{E}[M_n - M_{n-1} | \mathcal{F}_{n-1}] = 0$ .

2.5 Doob’s Maximal Inequalities

How largely can a martingale fluctuate? Doob’s inequality bounds the tail probability of the running maximum. Let $M_n^* = \max_{0 \le k \le n} |X_k|$ . Theorem (Doob’s Submartingale Inequality): If $X$ is a non-negative submartingale, then for any $\lambda > 0$ :

P(M_n^* \ge \lambda) \le \frac{\mathbb{E}[X_n]}{\lambda}

Proof (Sketch): Let $E = \{M_n^* \ge \lambda\}$ . Partition $E$ based on the first time the process crosses $\lambda$ : $E_k = \{X_k \ge \lambda, \forall j < k, X_j < \lambda\}$ . Since $X$ is a submartingale, $\mathbb{E}[X_n \mathbb{1}_{E_k}] \ge \mathbb{E}[X_k \mathbb{1}_{E_k}] \ge \lambda P(E_k)$ . Summing over $k$ yields the result.

Corollary ( $L^p$ Inequality): If $X_n$ is a martingale and $X_n \in L^p$ for $p > 1$ , then:

\|M_n^*\|_p \le \frac{p}{p-1} \|X_n\|_p

This says the maximum is controlled by the endpoint. It allows extending $L^p$ convergence from the endpoint to the uniform convergence of the whole path.

3. Stopping Times & Optional Stopping

A random variable $T: \Omega \to \mathbb{N} \cup \{\infty\}$ is a stopping time if $\{T \le n\} \in \mathcal{F}_n$ for all $n$ . The decision to stop depends only on the present and past, not the future.

Example: “Stop when $S_n \ge 10$ ” is a stopping time.
Example: “Stop at the peak value” is NOT a stopping time (requires future knowledge).

Do martingales maintain their “fairness” if simply stopped? Yes. Let $X_n^T = X_{T \land n}$ be the stopped process. $X^T$ is always a martingale if $X$ is. However, does $\mathbb{E}[X_T] = \mathbb{E}[X_0]$ ? Counter-example (Doubling Strategy): Let $S_n$ be a symmetric random walk starting at 0. Let $T = \inf \{n : S_n = 1\}$ . $T < \infty$ almost surely. Then $S_T = 1$ , so $\mathbb{E}[S_T] = 1 \ne \mathbb{E}[S_0] = 0$ . The problem is that $S_n$ can become arbitrarily negative before hitting 1 (lack of integrability).

Theorem (Doob's Optional Stopping Theorem).

If $X_n$ is a martingale and $T$ is a stopping time, then $\mathbb{E}[X_T] = \mathbb{E}[X_0]$ if any of the following hold:

$T$ is bounded ( $T \le N$ a.s.).
$X$ is uniformly bounded ( $|X_n| \le K$ a.s.).
$X$ is Uniformly Integrable and $T < \infty$ a.s.

3.5 Uniform Integrability (UI)

The “correct” condition for convergence of expectations ( $\mathbb{E}[X_n] \to \mathbb{E}[X]$ ) is Uniform Integrability. Definition: A family $\{X_i\}_{i \in I}$ is Uniformly Integrable if:

\lim_{K \to \infty} \sup_{i \in I} \mathbb{E}[|X_i| \mathbb{1}_{|X_i| > K}] = 0

This rules out “mass escaping to infinity”. For the doubling strategy, the mass escapes: at any time $n$ , there is a tiny probability of a huge loss. Theorem: $X_n \to X$ in $L^1$ $\iff$ $X_n \to X$ in Probability AND $\{X_n\}$ is UI. Since Martingales represent “conditional expectations”, and conditional expectation is an $L^1$ contraction, UI is the natural setting for Martingale convergence. Doob’s UI Theorem: If $X_n$ is a martingale, the following are equivalent:

$\{X_n\}$ is UI.
$X_n$ converges in $L^1$ to a limit $X_\infty$ .
$X_n = \mathbb{E}[X_\infty | \mathcal{F}_n]$ (It is a “closed” martingale).

4. Martingale Convergence Theorems

When does a martingale converge to a limit?

Theorem (Doob's Martingale Convergence Theorem).

If $(X_n)$ is a supermartingale bounded in $L^1$ (i.e., $\sup_n \mathbb{E}|X_n| < \infty$ ), then $X_n$ converges almost surely to a random variable $X_\infty$ with $\mathbb{E}|X_\infty| < \infty$ .

Key Tool: The Upcrossing Inequality. This is the heart of the convergence proof. Let $U_N[a, b]$ be the number of times the process crosses from below $a$ to above $b$ by time $N$ . We define “crossing times” recursively:

$T_1 = \min \{n : X_n \le a\}$
$T_2 = \min \{n > T_1 : X_n \ge b\}$
$T_3 = \min \{n > T_2 : X_n \le a\}$ And so on. $U_N$ counts the completed up-trips. Doob’s Upcrossing Inequality:

(b - a) \mathbb{E}[U_N[a, b]] \le \mathbb{E}[(X_N - a)^-]

Proof Intuition: Think of a gambling strategy: “Buy low (at $a$ ), Sell high (at $b$ )”. Every completed upcrossing nets you profit $b-a$ . Since distinct intervals of upcrossings don’t overlap, and you are betting on a Supermartingale (unfavorable game), your total expected profit is bounded. The RHS measures the “risk” or potential loss at the end. Since $\mathbb{E}|X_N| < \infty$ , the number of crossings is finite a.s. Thus, $X_n$ cannot oscillate strictly between any rational numbers $a < b$ forever. It must converge.

4.5 Application: The Secretary Problem (Optimal Stopping)

A classic application of Martingale theory (Snell Envelopes) is the Secretary Problem. Setup: You interview $N$ candidates sequentially. You can rank them relative to those effectively seen. After each interview, you must Hire or Reject. Rejection is final. Goal: Maximize the probability of selecting the absolute best candidate. Strategy: Reject the first $k-1$ candidates, then hire the first one better than all previous. Let $M_n = P(\text{Win} | \text{Stop at } n)$ . We construct the Snell Envelope (the smallest supermartingale dominating the payoff process). Using backward induction, one can show the optimal stopping rule is to skip $N/e$ candidates. The success probability approaches $1/e \approx 37\%$ .

5. Concentration Inequalities

One of the most modern applications of martingales is proving that random variables concentrate sharply around their means.

Theorem (Azuma-Hoeffding Inequality).

Let $(X_n)$ be a martingale with bounded differences $|X_k - X_{k-1}| \le c_k$ almost surely. Then for any $t > 0$ :

P(|X_n - X_0| \ge t) \le 2 \exp \left( - \frac{t^2}{2 \sum_{k=1}^n c_k^2} \right)

5.1 Proof of Azuma-Hoeffding

The proof relies on the Chernoff Bound technique.

Exponential Markov: For any $\lambda > 0$ : $P(X_n - X_0 \ge t) = P(e^{\lambda(X_n - X_0)} \ge e^{\lambda t}) \le e^{-\lambda t} \mathbb{E}[e^{\lambda(X_n - X_0)}]$
Telescoping: Write $X_n - X_0 = \sum_{k=1}^n D_k$ , where $D_k = X_k - X_{k-1}$ . $\mathbb{E}[e^{\lambda \sum D_k}] = \mathbb{E} [ e^{\lambda \sum_{k=1}^{n-1} D_k} \mathbb{E}[e^{\lambda D_n} | \mathcal{F}_{n-1}] ]$
Hoeffding’s Lemma: Since $|D_n| \le c_n$ and $\mathbb{E}[D_n | \mathcal{F}_{n-1}] = 0$ , the conditional moment generating function is bounded by that of a Rademacher variable: $\mathbb{E}[e^{\lambda D_n} | \mathcal{F}_{n-1}] \le e^{\frac{\lambda^2 c_n^2}{8}}$
Iteration: Applying this $n$ times: $\mathbb{E}[e^{\lambda(X_n - X_0)}] \le \prod_{k=1}^n e^{\frac{\lambda^2 c_k^2}{8}} = e^{\frac{\lambda^2}{8} \sum c_k^2}$
Optimization: Pick $\lambda = \frac{4t}{\sum c_k^2}$ to minimize the bound: $P \le \exp \left( - \lambda t + \frac{\lambda^2}{8} \sum c_k^2 \right) = \exp \left( - \frac{2t^2}{\sum c_k^2} \right)$

5.2 The Doob Martingale & McDiarmid’s Inequality

How do we use this for general functions? Let $Z_1, \dots, Z_n$ be independent random variables. Let $f(Z)$ be a function we want to analyze. Construct the Doob Martingale:

M_k = \mathbb{E}[f(Z) | Z_1, \dots, Z_k]

$M_0 = \mathbb{E}[f(Z)]$ (The mean).
$M_n = f(Z)$ (The random variable itself). The difference $D_k = M_k - M_{k-1}$ represents the “information revealed” by $Z_k$ . If changing one input $z_i$ changes $f$ by at most $c_i$ (Bounded Difference Property), then $|D_k| \le c_k$ .

Theorem (McDiarmid’s Inequality): For any function $f$ satisfying the bounded difference property with constants $c_1, \dots, c_n$ :

P(|f(Z) - \mathbb{E}[f(Z)]| \ge t) \le 2 \exp \left( - \frac{2t^2}{\sum c_i^2} \right)

Application:

Chromatic Number of Random Graphs: changing one edge affects the chromatic number by at most 1. Thus, $\chi(G_{n,p})$ is extremely concentrated around its mean.
Machine Learning: The generalization error of a classifier concentrates because changing one training example has bounded effect (Stability).

6. Applications in Finance

The most famous application of Martingales is in option pricing. A market consists of risky assets (stocks) and a risk-free asset (bond). First Fundamental Theorem of Asset Pricing: The market has no arbitrage (free lunch) if and only if there exists a Risk-Neutral Measure $\mathbb{Q}$ equivalent to the physical measure $\mathbb{P}$ such that the discounted stock price is a $\mathbb{Q}$ -martingale.

S_n = \mathbb{E}_{\mathbb{Q}}[e^{-r} S_{n+1} | \mathcal{F}_n]

Under $\mathbb{Q}$ , the expected return of the stock is exactly the risk-free rate $r$ . The real-world drift $\mu$ disappears.

6.1 Girsanov Theorem (Change of Measure)

How do we move from $\mathbb{P}$ to $\mathbb{Q}$ ? We re-weight the probabilities. Let $Z_n = \frac{d\mathbb{Q}}{d\mathbb{P}}|_{\mathcal{F}_n}$ be the Radon-Nikodym derivative process. $Z_n$ is a positive martingale with $\mathbb{E}[Z_n] = 1$ . Girsanov Theorem: If $M_n$ is a martingale under $\mathbb{P}$ , then $\tilde{M}_n = M_n - \sum \frac{1}{Z_{k-1}} \mathbb{E}[\Delta M_k \Delta Z_k | \mathcal{F}_{k-1}]$ is a martingale under $\mathbb{Q}$ . This allows us to “drift removal”: turning a biased random walk (stock with drift) into a symmetric random walk (martingale) by changing the probability of up/down moves. This is the essence of the Black-Scholes Formula.

6.2 Deriving Black-Scholes (Heuristic)

Discrete Binomial Model $\to$ Continuous Geometric Brownian Motion. Let $\Delta t \to 0$ . The stock price evolves as:

dS_t = \mu S_t dt + \sigma S_t dW_t

where $W_t$ is a Brownian Motion (limit of Random Walk). Under the physical measure $\mathbb{P}$ , the drift is $\mu$ . We choose a Risk-Neutral measure $\mathbb{Q}$ such that the drift becomes the risk-free rate $r$ .

dS_t = r S_t dt + \sigma S_t d\tilde{W}_t

The option price $V(S, t)$ is the discounted expected payoff under $\mathbb{Q}$ :

V(S_t, t) = e^{-r(T-t)} \mathbb{E}_{\mathbb{Q}} [\max(S_T - K, 0) | \mathcal{F}_t]

Solving this expectation yields the Black-Scholes PDE:

\frac{\partial V}{\partial t} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + r S \frac{\partial V}{\partial S} - rV = 0

Remarkably, $\mu$ (the stock’s expected return) does not appear! This is because we can perfectly hedge the risk.

7. Randomized Algorithms & Simulation

Martingales analyze the runtime of randomized algorithms. Consider a random walk on a graph $G=(V, E)$ . Let $T$ be the hitting time of a vertex set. We construct martingales $M_n = \phi(X_n) - n$ to estimate $\mathbb{E}[T]$ .

7.1 Simulation: Wald’s Identity

Let $X_i$ be i.i.d. with mean $\mu$ . Let $S_n = \sum X_i$ . Let $T$ be a stopping time with $\mathbb{E}[T] < \infty$ . Wald’s Identity: $\mathbb{E}[S_T] = \mu \mathbb{E}[T]$ .

Let’s simulate a Biased Random Walk stopping at thresholds $-A$ or $+B$ . We verify that $M_n = S_n - n\mu$ is a martingale and checks Wald’s identity.


import numpy as np
 
def simulate_wald(p=0.55, A=10, B=10, trials=1000):
    """
    Biased random walk X_i = +/- 1.
    P(X=1) = p. Drift mu = 2p - 1.
    Stop when S_n reaches -A or +B.
    """
    mu = 2*p - 1
    
    stopping_times = []
    final_values = []
    
    for _ in range(trials):
        S = 0
        t = 0
        while -A < S < B:
            step = 1 if np.random.rand() < p else -1
            S += step
            t += 1
        stopping_times.append(t)
        final_values.append(S)
        
    E_T = np.mean(stopping_times)
    E_S_T = np.mean(final_values)
    
    print(f"p={p}, Drift mu={mu:.2f}")
    print(f"Simulated E[T]: {E_T:.2f}")
    print(f"Simulated E[S_T]: {E_S_T:.2f}")
    print(f"Wald Check (mu * E[T]): {mu * E_T:.2f}")
    
    # Analytical Formula for Gambler's Ruin
    # P(Reach B) = (1 - (q/p)^A) / (1 - (q/p)^(A+B))
    q = 1-p
    if p != 0.5:
        ratio = q/p
        prob_B = (1 - ratio**A) / (1 - ratio**(A+B))
        analytical_E_S = B * prob_B + (-A) * (1 - prob_B)
        print(f"Analytical E[S_T]: {analytical_E_S:.2f}")
 
# simulate_wald(p=0.55, A=10, B=10)

The simulation confirms that $\mathbb{E}[S_T] = \mathbb{E}[T] \cdot \mu$ , provided the stopping time satisfies the conditions (bounded expectation).

8. Conclusion

We have traversed the landscape of Martingale theory, from the foundational definitions of Doob to the modern applications in High-Dimensional Probability and Finance. The power of martingales lies in their universality:

Convergence: If you can’t beat the house (Supermartingale), your wealth converges.
Concentration: If no single step dictates the outcome (Bounded Differences), the sum concentrates Gaussianly.
Pricing: If there is no arbitrage, there is a Martingale measure.

Whether analyzing the runtime of QuickSort or pricing a barrier option, the “fair game” property allows us to extract precise information from random noise.

9. Polya’s Urn Scheme

A beautiful example of Martingale convergence is Polya’s Urn. Setup: Start with 1 Red and 1 Green ball. At each step $n$ , pick a ball uniformly at random. Return it to the urn along with another ball of the same color. Let $X_n$ be the fraction of Red balls at time $n$ . Initially $X_0 = 1/2$ . Claim: $X_n$ is a martingale. Let $R_n, G_n$ be the number of red/green balls. Total balls $T_n = n+2$ .

P(R_{n+1} = R_n + 1 | \mathcal{F}_n) = \frac{R_n}{T_n} = X_n

\mathbb{E}[X_{n+1} | \mathcal{F}_n] = X_n \frac{R_n+1}{T_n+1} + (1-X_n) \frac{R_n}{T_n+1} = \frac{R_n(R_n+1 + G_n)}{(R_n+G_n)(T_n+1)} = \frac{R_n}{T_n} = X_n

Convergence: Since $X_n$ is bounded in $[0, 1]$ , by the Martingale Convergence Theorem, $X_n \to X_\infty$ almost surely. Limit Distribution: What is $X_\infty$ ? It is NOT $1/2$ . The rich get richer. One can show by induction that $R_n$ is uniformly distributed on $\{1, \dots, n+1\}$ . Thus, $X_\infty$ is uniformly distributed on $[0, 1]$ (Beta(1, 1)). If we started with $a$ red and $b$ green, the limit would be Beta(a, b). This is a standard model for “path dependence” in economics and reinforcement learning.

Historical Timeline

1934: Paul Lévy introduces the concept of Martingales (though not the name) in his work on sums of dependent variables.
1939: Jean Ville coins the term “Martingale” (referencing a gambling strategy) and proves the first inequality.
1953: Joseph Doob publishes “Stochastic Processes”, formalizing the theory (Decomposition, Convergence, Optional Stopping).
1965: Azuma & Hoeffding independently prove the concentration inequality for bounded difference martingales.
1973: Black & Scholes apply martingale measures (Risk-Neutral Pricing) to option valuation.
1990s: McDiarmid extends concentration to general functions, revolutionizing Learning Theory.

Appendix A: The Radon-Nikodym Derivative

We often define conditional expectation $\mathbb{E}[X | \mathcal{G}]$ as the unique $\mathcal{G}$ -measurable random variable $Z$ such that:

\int_A Z dP = \int_A X dP, \quad \forall A \in \mathcal{G}

This existence relies on the Radon-Nikodym Theorem. Let $\mu$ and $\nu$ be two measures on $(\Omega, \mathcal{F})$ . $\nu$ is absolutely continuous with respect to $\mu$ ( $\nu \ll \mu$ ) if $\mu(A) = 0 \implies \nu(A) = 0$ . Theorem: If $\nu \ll \mu$ and both are $\sigma$ -finite, there exists a unique measurable function $f = \frac{d\nu}{d\mu}$ such that:

\nu(A) = \int_A f d\mu

Martingale Connection: Let $(\Omega, \mathcal{F}, \mathbb{Q})$ and $(\Omega, \mathcal{F}, \mathbb{P})$ be probability measures. The likelihood ratio process $L_n = \frac{d\mathbb{Q}|_{\mathcal{F}_n}}{d\mathbb{P}|_{\mathcal{F}_n}}$ is always a non-negative $\mathbb{P}$ -martingale. If $L_n \to L_\infty$ with $\mathbb{E}[L_\infty] = 1$ , then $\mathbb{Q} \ll \mathbb{P}$ on $\mathcal{F}_\infty$ . If $L_n \to 0$ , then $\mathbb{Q}$ and $\mathbb{P}$ are singular (mutually exclusive) in the limit (Kakutani’s Dichotomy).

Appendix B: Backward Martingales

What happens if time goes backwards? Let $\{\mathcal{G}_{-n}\}_{n \ge 0}$ be a decreasing sequence of $\sigma$ -algebras: $\mathcal{G}_0 \supseteq \mathcal{G}_{-1} \supseteq \dots \supseteq \mathcal{G}_{-\infty} = \cap \mathcal{G}_{-n}$ . A sequence $X_{-n}$ is a backward martingale if $\mathbb{E}[X_{-n} | \mathcal{G}_{-n-1}] = X_{-n-1}$ (or index it $X_n$ adapted to decreasing $\mathcal{G}_n$ ). Theorem: Every backward martingale converges almost surely and in $L^1$ :

X_{-n} \to X_{-\infty} = \mathbb{E}[X_0 | \mathcal{G}_{-\infty}]

Application: Strong Law of Large Numbers (SLLN). Let $S_n = X_1 + \dots + X_n$ where $X_i$ are i.i.d. Let $\mathcal{G}_{-n} = \sigma(S_n, S_{n+1}, \dots)$ (Symmetric information). One can show that $\mathbb{E}[X_1 | \mathcal{G}_{-n}] = S_n / n$ . Since this is a backward martingale, $S_n/n$ converges a.s. to $\mathbb{E}[X_1 | \text{Tail}]$ . By Kolmogorov’s 0-1 Law, the Tail algebra is trivial, so the limit is constant $\mu$ .

Appendix C: Proof of Optional Stopping Theorem

The Optional Stopping Theorem is subtle. Here we prove the bounded case. Theorem: If $M_n$ is a martingale and $T \le K$ is a bounded stopping time, then $\mathbb{E}[M_T] = \mathbb{E}[M_0]$ . Proof: We can write the stopped value as a sum of increments:

M_T = M_0 + \sum_{n=1}^K (M_n - M_{n-1}) \mathbb{1}_{n \le T}

The event $\{n \le T\}$ is the complement of $\{T < n\} = \{T \le n-1\}$ . Since $\{T \le n-1\} \in \mathcal{F}_{n-1}$ , its complement $\{n \le T\}$ is also in $\mathcal{F}_{n-1}$ . This means the process $H_n = \mathbb{1}_{n \le T}$ is predictable. Now consider the expectation:

\mathbb{E}[M_T - M_0] = \mathbb{E} \left[ \sum_{n=1}^K (M_n - M_{n-1}) H_n \right]

By linearity of expectation (since sum is finite $K$ ):

= \sum_{n=1}^K \mathbb{E} [ (M_n - M_{n-1}) H_n ]

Use the tower property of conditional expectation:

\mathbb{E} [ (M_n - M_{n-1}) H_n ] = \mathbb{E} [ \mathbb{E} [ (M_n - M_{n-1}) H_n | \mathcal{F}_{n-1} ] ]

Since $H_n$ is $\mathcal{F}_{n-1}$ -measurable (“known at time $n-1$ ”):

= \mathbb{E} [ H_n \mathbb{E} [ M_n - M_{n-1} | \mathcal{F}_{n-1} ] ]

By definition of martingale, $\mathbb{E} [ M_n - M_{n-1} | \mathcal{F}_{n-1} ] = 0$ . Thus, every term in the sum is 0. $\mathbb{E}[M_T] = \mathbb{E}[M_0]$ . $\square$ Extension to UI Case: If $T < \infty$ a.s. but unbounded, we approximate $T$ by $T \land K$ . $\mathbb{E}[M_{T \land K}] = \mathbb{E}[M_0]$ . We need to take the limit $K \to \infty$ .

\lim_{K \to \infty} \mathbb{E}[M_{T \land K}] = \mathbb{E}[M_T]

This exchange of limit and expectation requires Uniform Integrability of the family $\{M_{T \land K}\}_K$ . If the martingale is UI, this holds.

Appendix D: The Galton-Watson Branching Process

Another canonical martingale arises in population dynamics. Setup: Start with 1 individual ( $Z_0 = 1$ ). At each step, every individual $i$ in generation $n$ produces $X_{n,i}$ offspring, i.i.d. with mean $\mu = \mathbb{E}[X]$ . The population size evolves as:

Z_{n+1} = \sum_{j=1}^{Z_n} X_{n,j}

Claim: $W_n = Z_n / \mu^n$ is a nonnegative martingale. Proof:

\mathbb{E}[Z_{n+1} | \mathcal{F}_n] = \mathbb{E} \left[ \sum_{j=1}^{Z_n} X_{n,j} \Bigg| Z_n \right] = Z_n \mathbb{E}[X] = Z_n \mu

Dividing by $\mu^{n+1}$ :

\mathbb{E} \left[ \frac{Z_{n+1}}{\mu^{n+1}} \Big| \mathcal{F}_n \right] = \frac{Z_n \mu}{\mu^{n+1}} = \frac{Z_n}{\mu^n} = W_n

Convergence: Since $W_n \ge 0$ , it converges a.s. to a limit $W_\infty$ . Interpretation:

If $\mu \le 1$ (Subcritical/Critical), the population goes extinct ( $Z_n \to 0$ ) almost surely (unless $P(X=1)=1$ ). $W_\infty = 0$ .
If $\mu > 1$ (Supercritical), there is a positive probability of survival. On the survival set, $Z_n$ grows geometrically like $\mu^n \cdot W_\infty$ . The random variable $W_\infty$ describes the “random magnitude” of the explosion.

Appendix E: Glossary of Terms

Adapted Process: A process $X_n$ where the value at time $n$ is known given information $\mathcal{F}_n$ .
Filtration: An increasing stream of information $\sigma$ -algebras.
Martingale: A process where the best prediction of the future is the present value.
Predictable Process: A process $A_n$ whose value at time $n$ is known at time $n-1$ .
Stopping Time: A random time $T$ where the decision to stop depends only on past and present.
Submartingale: A process that tends to increase (favorable game).
Uniform Integrability (UI): A condition preventing probabilities from “escaping to infinity”, ensuring $L^1$ convergence.
Upcrossing: The event of a process crossing from below a level $a$ to above a level $b$ .

References

1. Williams, D. (1991). “Probability with Martingales”. The gold standard for discrete-time martingales. Famous for its clear intuition (“The method of the gambling team”).

2. Durrett, R. (2019). “Probability: Theory and Examples”. The standard graduate text. Covers measure theory rigor.

3. Chung, K. L. (1974). “A Course in Probability Theory”. Classic text with deep insights into convergence theorems.

4. Steele, J. M. (2004). “The Cauchy-Schwarz Master Class”. Excellent for inequalities (Azuma, etc.).