Martingale Theory
A martingale is a sequence of random variables where the conditional expectation of the next value given all the past information comes out equal to the present value. This conservation property hands you strong convergence results and sits at the foundation of a few big areas.
- Mathematical finance and things like the Efficient Market Hypothesis and option pricing.
- Sequential analysis like deciding when to stop a clinical trial or an A/B test with Wald’s Identity.
- Modern statistics and especially concentration bounds for randomized algorithms with Azuma-Hoeffding.
What follows is a measure-theoretic walk through discrete-time martingales and covers convergence theorems and applications in finance and computer science.
1. Definitions and Measure-Theoretic Setup
Let be a probability space.
1.1 Information and Filtrations
We model how information piles up over time with a filtration. Definition (Filtration). A filtration is an increasing sequence of sub--algebras with
and here stands in for the information you have at time .
- Natural filtration. and it is the -algebra spun up by the history of the process itself.
- Dyadic filtration. On let come from the intervals , and as the point gets nailed down with infinite precision through its binary expansion.
Definition (Adapted Process). A sequence is adapted to when is -measurable for every , and this means the value of is set by the information you have at time .
1.2 Martingales, Submartingales, Supermartingales
Definition. An adapted and integrable sequence is a martingale with respect to when
- Integrability. for all .
- Martingale property. almost surely.
Swapping the equality for an inequality gives two cousins.
- Submartingale (). and the process tends to drift up, and this also covers convex functions of martingales by Jensen’s inequality.
- Supermartingale (). and the process tends to drift down, and this is the standard model of an unfavorable game.
1.3 Canonical Examples
- Simple Symmetric Random Walk. where with probability each. and this is a martingale.
- Biased Random Walk. When you get and this is a submartingale. But the geometric process is a martingale and this is De Moivre’s martingale.
- Polya’s Urn. An urn holds red and green balls and you pick one and return it along with another of the same color. Let be the fraction of red balls at time and then is a martingale. Because is bounded in it converges almost surely, and the limit follows a Beta distribution and does not pile up at a point.
- Likelihood Ratios. Let and be probability measures and let be the Radon-Nikodym derivative cut down to the first observations. Under measure the is a nonnegative martingale.
2. Doob’s Decomposition
Every submartingale splits uniquely into a martingale piece that holds the fluctuations and a predictable increasing piece that holds the drift. This is the discrete-time version of the Doob-Meyer Decomposition.
Let be a submartingale and there is a unique decomposition where
- is a martingale.
- is a predictable increasing process ( is measurable), with .
Proof. Set and because is a submartingale you have . Let and then is predictable and increasing. Set and then .
2.5 Doob’s Maximal Inequalities
The next inequality bounds the fluctuations of a submartingale by its endpoint. Let . Theorem (Doob’s Submartingale Inequality). When is a non-negative submartingale then for any
Proof (Sketch). Let and split by the first time the process crosses so . Because is a submartingale you get , and summing over hands you the result.
Corollary ( Inequality). When is a martingale and for then
And so the maximum of the process is bounded by the endpoint in , and convergence of the endpoint steps up to uniform convergence of the whole path.
3. Stopping Times and Optional Stopping
A random variable is a stopping time when for all . The decision to stop can only lean on the present and past information.
- Example. “Stop when ” is a stopping time.
- Non-example. “Stop at the time the process hits its maximum” needs knowledge of the future and is not a stopping time.
The stopped process hangs onto the martingale property and so when is a martingale so is . The deeper question is whether .
Counter-example (Doubling Strategy). Let be a symmetric random walk starting at 0 and let , and then almost surely and and so . The equality breaks because can drop arbitrarily negative before hitting 1 and this breaks the integrability conditions for optional stopping.
When is a martingale and is a stopping time then when any of the following hold.
- is bounded ( a.s.).
- is uniformly bounded ( a.s.).
- is uniformly integrable and a.s.
3.5 Uniform Integrability (UI)
Uniform integrability is the condition that makes expectations converge so that . Definition. A family is uniformly integrable when
Uniform integrability stops probability mass from running off to infinity. In the doubling strategy the mass does run off, and at any finite time there is a small probability of a big loss and these contributions do not fade out in a uniform way.
Theorem. in exactly when in probability and is UI.
Because conditional expectation is an contraction the uniform integrability is the natural regularity condition for martingale convergence.
Doob’s UI Theorem. When is a martingale the following are equivalent.
- is UI.
- converges in to a limit .
- and the martingale is “closed”.
4. Martingale Convergence Theorems
The basic question is figuring out when a martingale runs toward a limit.
When is a supermartingale bounded in so that then converges almost surely to a random variable with .
The Upcrossing Inequality. The upcrossing inequality sits at the heart of the convergence proof. Let count how many times the process crosses upward from below to above by time . The crossing times are set recursively.
and so on and counts the completed upcrossings.
Doob’s Upcrossing Inequality.
Proof Idea. Think of a gambling strategy that buys at level and sells at level . Each completed upcrossing hands you a profit of and the upcrossing intervals do not overlap. Because the underlying process is a supermartingale and so an unfavorable game the total expected profit is bounded above, and the right-hand side holds the worst-case loss at the final time.
Because the expected number of upcrossings is finite. And so cannot swing between any two rationals infinitely often and convergence follows.
4.5 Application: The Secretary Problem (Optimal Stopping)
A classic application of martingale theory through Snell Envelopes is the Secretary Problem. Setup. You interview candidates one after another and you can rank them against the ones you have already seen. After each interview you have to hire or reject and a rejection is final. Goal. Push up the probability of picking the absolute best candidate. Strategy. Reject the first candidates and then hire the first one better than everyone before. Let . We build the Snell Envelope which is the smallest supermartingale sitting above the payoff process. By backward induction you can show the optimal stopping rule is to skip candidates. The success probability runs toward .
5. Concentration Inequalities
One of the biggest modern uses of martingales is pinning down sharp concentration of random variables around their means.
Let be a martingale with bounded differences almost surely and then for any
5.1 Proof of Azuma-Hoeffding
The proof runs through the Chernoff bound trick.
- Exponential Markov. For any
- Telescoping. Write where .
- Hoeffding’s Lemma. Because and which is an interval of length you get
- Iteration. Apply this times to get
- Optimization. Pick to make the bound as small as possible.
5.2 The Doob Martingale and McDiarmid’s Inequality
To push Azuma-Hoeffding onto a general function of independent random variables we use the Doob martingale construction. Let be independent random variables and let be a function you care about. The Doob martingale is
- is the unconditional mean.
- is the realized value. The difference captures the information that kicks in. When changing any one input shifts by at most which is the bounded difference property then .
Theorem (McDiarmid’s Inequality). For any function that has the bounded difference property with constants
Application.
- Chromatic Number of Random Graphs. Changing one edge shifts the chromatic number by at most 1 and so packs tightly around its mean.
- Machine Learning. The generalization error of a classifier concentrates because changing one training example has a bounded effect on the output and this is where stability-based bounds come from.
6. Applications in Finance
The main use of martingale theory in mathematical finance is option pricing. A market holds risky assets like stocks and a risk-free asset like a bond. First Fundamental Theorem of Asset Pricing. The market has no arbitrage exactly when there is a risk-neutral measure equivalent to the physical measure where the discounted stock price is a -martingale.
Under the expected return of the stock comes out to the risk-free rate and the real-world drift drops out.
6.1 Girsanov Theorem (Change of Measure)
The move from to happens by re-weighting probabilities. Let be the Radon-Nikodym derivative process. is a positive martingale with . Girsanov Theorem. When is a martingale under then is a martingale under . This lets you strip the drift out and a biased random walk for a stock with drift turns into a martingale under the new measure when you tweak the probabilities of up and down moves. Drift removal through a change of measure sits under the Black-Scholes formula.
6.2 Deriving Black-Scholes (Heuristic)
In the continuous-time limit of the binomial model as the stock price runs as
where is a Brownian Motion and it is the scaling limit of a random walk. Under the physical measure the drift is . Picking the risk-neutral measure swaps for the risk-free rate .
The option price is the discounted expected payoff under .
Working this expectation out gives the Black-Scholes PDE.
The stock’s expected return does not show up in this equation and this falls out of the fact that the option’s risk can be hedged away completely.
7. Randomized Algorithms and Simulation
Martingales give you a natural setup for working out the runtime of randomized algorithms. Take a random walk on a graph and let be the hitting time of a vertex set. We build martingales to estimate .
7.1 Simulation: Wald’s Identity
Let be i.i.d. with mean and let and let be a stopping time with . Wald’s Identity. .
The simulation below runs a biased random walk stopped at thresholds and and checks that is a martingale and that Wald’s identity holds.
import numpy as np
def simulate_wald(p=0.55, A=10, B=10, trials=1000):
"""
Biased random walk X_i = +/- 1.
P(X=1) = p. Drift mu = 2p - 1.
Stop when S_n reaches -A or +B.
"""
mu = 2*p - 1
stopping_times = []
final_values = []
for _ in range(trials):
S = 0
t = 0
while -A < S < B:
step = 1 if np.random.rand() < p else -1
S += step
t += 1
stopping_times.append(t)
final_values.append(S)
E_T = np.mean(stopping_times)
E_S_T = np.mean(final_values)
print(f"p={p}, Drift mu={mu:.2f}")
print(f"Simulated E[T]: {E_T:.2f}")
print(f"Simulated E[S_T]: {E_S_T:.2f}")
print(f"Wald Check (mu * E[T]): {mu * E_T:.2f}")
# Analytical Formula for Gambler's Ruin
# P(Reach B) = (1 - (q/p)^A) / (1 - (q/p)^(A+B))
q = 1-p
if p != 0.5:
ratio = q/p
prob_B = (1 - ratio**A) / (1 - ratio**(A+B))
analytical_E_S = B * prob_B + (-A) * (1 - prob_B)
print(f"Analytical E[S_T]: {analytical_E_S:.2f}")
# simulate_wald(p=0.55, A=10, B=10)The simulation backs up as long as the stopping time has finite expectation.
8. Conclusion
The power of martingales comes from how universal they are.
- Convergence. An -bounded supermartingale converges almost surely.
- Concentration. Under bounded differences the sums show sub-Gaussian concentration.
- Pricing. No arbitrage is the same as having a martingale measure.
Whether you are looking at QuickSort runtime or barrier option pricing the martingale property gives a single framework for pulling precise quantitative information out of stochastic processes.
9. Polya’s Urn Scheme
Polya’s Urn gives a nice picture of martingale convergence. Setup. Start with 1 red and 1 green ball. At each step you pull a ball at random and then return it to the urn along with one more ball of the same color. Let be the fraction of red balls at time . At the start . Claim. is a martingale. Let and be the number of red and green balls and the total is .
Convergence. Because is bounded in the Martingale Convergence Theorem says almost surely. Limit Distribution. The limit is not a constant and by induction you can check that is uniform on . And so is uniform on which is Beta(1, 1). More generally starting with red and green balls hands you a Beta(, ) limit. The Polya urn is a standard model for path dependence in economics and reinforcement learning.
The Radon-Nikodym Derivative
Conditional expectation is set as the unique -measurable random variable where
and the Radon-Nikodym theorem guarantees that such a exists. Let and be two measures on . is absolutely continuous with respect to and written when forces . Theorem. When and both are -finite there is a unique measurable function with
Martingale connection. Let and be probability measures. The likelihood ratio process is always a non-negative -martingale. When with you get on . When the measures and are mutually singular in the limit and this is Kakutani’s Dichotomy.
Backward Martingales
Take a decreasing sequence of -algebras where . A sequence is a backward martingale when . Theorem. Every backward martingale converges almost surely and in .
This hooks into the strong law of large numbers. Let where the are i.i.d. Let which is the symmetric -algebra. You can check that . Because this is a backward martingale converges almost surely to , and by Kolmogorov’s 0-1 Law the tail -algebra is trivial and so the limit is the constant .
Proof of the Optional Stopping Theorem
We prove the Optional Stopping Theorem in the bounded case. Theorem. When is a martingale and is a bounded stopping time then . Proof. You can write the stopped value as a sum of increments.
The event is the complement of . Because the complement also sits in . And this means the process is predictable. Now look at the expectation.
By linearity of expectation because the sum is finite you get
Use the tower property of conditional expectation.
Because is -measurable
and by the martingale property . And so every term in the sum dies and you get .
Origins
Paul Levy brought in the main ideas behind martingales in 1934 while working on sums of dependent random variables and he did not use the name. Jean Ville coined the term “martingale” in 1939 as a reference to a class of gambling strategies and proved the first supermartingale inequality. Joseph Doob’s 1953 book “Stochastic Processes” set up the modern theory and covered the decomposition theorem and the convergence theorem and optional stopping. Azuma and Hoeffding each proved the bounded-difference concentration inequality in 1965 and it turned into a workhorse in combinatorics and computer science. The link to finance came together with Black and Scholes in 1973 when risk-neutral pricing recast option valuation as a martingale expectation. McDiarmid pushed concentration out to general functions of independent variables in the 1990s and made it a standard tool in learning theory and algorithm analysis.