Summer 2020

Martingales bounded in \(\mathcal{L}^2\)


  • Boundedness of a martingale is important for checking convergence
    • Yet boundedness in \(\mathcal{L}^1\) can be difficult to check
    • Boundedness in \(\mathcal{L}^1\): \(\sup_n E(|M_n|) < \infty\)
    • What is the difference between boundedness in \(\mathcal{L}^1\) and integrability \(E(|M_n|) < \infty, \forall n\)?
  • A martingale \(M\) bounded in \(\mathcal{L}^2\) is also bounded in \(\mathcal{L}^1\)
    • Easier to check boundedness in \(\mathcal{L}^2\) due to a Pythagorean formula \[ E(M_n^2) = E(M_0^2) +\sum_{k=1}^n E\left[ (M_k -M_{k-1})^2 \right] \]
  • This chapter also presents neat proofs of:
    • Three-Series Theorem
    • Strong Law of Large Numbers
    • Lévy’s extension of the Borel-Cantelli Lemmas

Martingales in \(\mathcal{L}^2\): orthogonal increments

  • Let \(M=\{M_n\}_{n \ge 0}\) be a martingale in \(\mathcal{L}^2\) so that \(E(M^2_n) < \infty, \forall n\)
  • By martingale property, for positive integers \(s \le t \le u \le v\), we have \[ E(M_v|\mathcal{F}_u) = M_u\quad (a.s.) \]
  • This implies the future increment \(M_v -M_u\) is orthogonal to the present information \(\mathcal{L}^2(\mathcal{F}_u)\), so \[ \langle M_t -M_s, M_v -M_u \rangle = 0 \]
    • Future increment is also orthogonal to the past increment since \(M_t -M_s \in \mathcal{L}^2(\mathcal{F}_u)\)
  • Hence it is possible to express \(M_n\) by sum of orthogonal increments: \[ M_n = M_0 +\sum_{k=1}^n (M_k -M_{k-1}) \]
  • Pythagoras’s theorem yields (since expectation of cross term vanishes) \[ E(M_n^2) = E(M_0^2) +\sum_{k=1}^n E\left[ (M_k -M_{k-1})^2 \right] \]

Boundedness in \(\mathcal{L}^2\): sum of increments square

  • Theorem 12.1.1 (numbered by order in the section):
    • Let \(M\) be a martingale for which \(M_n \in \mathcal{L}^2, \forall n\)
    • Then \(M\) is bounded in \(\mathcal{L}^2\) if and only if \(\sum E\left[ (M_k -M_{k-1})^2 \right] < \infty\)
    • And when this obtains, \(M_n \rightarrow M_\infty\) almost surely and in \(\mathcal{L}^2\)
      • Note: William implicitly assumed the martingale was indexed in discrete time by using \(k-1\)
      • However I think this theorem also holds for continuous time
  • Proof of \(\sup_n E(M_n^2) < \infty \iff \sum E\left[ (M_k -M_{k-1})^2 \right] < \infty\)
    • Use the Pythagorean formula \[ E(M_n^2) = E(M_0^2) +\sum_{k=1}^n E\left[ (M_k -M_{k-1})^2 \right] \]
      • Note: \(E(M_0^2)\) is unbounded implies \(E\left[ (M_1-M_0)^2 \right]\) and \(E(M_n^2)\) are also unbounded
      • So the theorem is safe even if there is no \(E(M_0^2)\) explicitly

  • Proof of \(M_n \rightarrow M_{\infty}\) almost surely and in \(\mathcal{L}^2\)
    • Suppose that \(M\) is bounded in \(\mathcal{L}^2\)
    • By monotonicity of norms, \(M\) is also bounded in \(\mathcal{L}^1\)
    • Apply Doob’s convergence theorem, we have \(M_n \stackrel{a.s.}{\rightarrow} M_\infty\)
    • The Pythagorean formula implies that \(E\left[ (M_{n+r} -M_n)^2 \right] = \sum_{k=n+1}^{n+r} E\left[ (M_k -M_{k-1})^2 \right]\)
    • When \(r \rightarrow \infty\), Fatou’s lemma yields \(E\left[ (M_\infty -M_n)^2 \right] \le \sum_{k \ge n+1} E\left[ (M_k -M_{k-1})^2 \right]\)
    • Hence \(\lim_n E\left[ (M_\infty -M_n)^2 \right] = 0\), i.e. \(M_n \stackrel{\mathcal{L}^2}{\rightarrow} M_\infty\)
      • Intuition: when \(n \rightarrow \infty\), there is no more increment on RHS

Sum of independent random variables in \(\mathcal{L}^2\)

Sum of independent zero-mean RVs in \(\mathcal{L}^2\)

  • Theorem 12.2.1:
    • Suppose that \(\{X_k\}_{k \in \mathbb{N}}\) is a sequence of independent RVs with zero-mean and finite variance \(\sigma_k^2\)
    • Then \(\sum \sigma_k^2 < \infty \implies \sum X_k\) converges almost surely
    • Further if \(X_k\) is bounded by some positive constant \(K\), then the reverse direction is also true
      • i.e. \(\sum X_k\) converges almost surely \(\implies \sum \sigma_k^2 < \infty\)
  • Notation: define
    • Natural filtration: \(\mathcal{F}_n := \sigma(X_1, X_2, \dots, X_n)\) where \(\mathcal{F}_0 := \{\varnothing, \Omega\}\)
    • Partial sum: \(M_n := \sum_{k=1}^n X_k\) where \(M_0 := 0\)
    • \(A_n := \sum_{k=1}^n \sigma_k^2\) where \(A_0 := 0\)
    • \(N_n := M_n^2 -A_n\) where \(N_0 := 0\)

  • Proof of \(\sum \sigma_k^2 < \infty \implies \sum X_k\) converges almost surely
    • From example in 10.4, \(M\) is a martingale
    • Using the Pythagorean formula, \[ E(M_n^2) = \sum_{k=1}^n E\left[ (M_k -M_{k-1})^2 \right] = \sum_{k=1}^n E(X_k^2) = \sum_{k=1}^n\sigma_k^2 = A_n \]
    • If \(\sum \sigma_k^2 < \infty\), then \(M\) is bounded in \(\mathcal{L}^2\) and \(M_n\) converges almost surely by theorem 12.1.1

  • Proof of \(\sum X_k\) converges almost surely \(\implies \sum \sigma_k^2 < \infty\)
    • Since \(X_k \perp \mathcal{F}_{k-1}\), we have, almost surely \[ E\left[ (M_k -M_{k-1})^2 | \mathcal{F}_{k-1} \right] = E[X_k^2 | \mathcal{F}_{k-1}] = E(X_k^2) = \sigma_k^2 \]
    • Similarly, since \(M_{k-1}\) is \(\mathcal{F}_{k-1}\) measurable, we can expand \((M_k -M_{k-1})^2\), almost surely \[ \sigma_k^2 = E(M_k^2 | \mathcal{F}_{k-1}) -2M_{k-1} E(M_k | \mathcal{F}_{k-1}) +M_{k-1}^2 = E(M_k^2 | \mathcal{F}_{k-1}) -M_{k-1}^2 \]
    • But this implies that \(N\) is a martingale (Recall \(N_n := M_n^2 -A_n\))
    • Now let \(c \in (0,\infty)\) and \(T := \inf\{ r:|M_r|>c\}\)
    • Since stopped martingale is also a martingale, \(E(N_n^T) = E\left[ (M_n^T)^2 \right] -E(A_{T \land n}) =0\)
    • By the further condition, we have \(|M_T-M_{T-1}| =|X_T| \le K\) if \(T < \infty\)
    • Hence \(E(A_{T \land n}) = E\left[ (M_n^T)^2 \right] \le (K+c)^2, \forall n\)
      • Intuition: same as upcrossing with last increment bounded by \(K\)
    • However, since \(\sum X_k\) converges a.s., the partial sums are a.s. bounded
    • So it must be the case that \(P(T = \infty) > 0\) for some \(c\) and \(A_\infty := \sum \sigma_k^2 < \infty\)

Random signs

  • Let \(\{a_n\}\) be a sequence of real numbers and \(\{\epsilon_n\}\) be a sequence of iid Rademacher RVs
    • Rademacher distribution: \(P(\epsilon_n = \pm 1) = 0.5\)
    • Frequently appear in statistical learning theory
  • Theorem 12.2.1 tells us that \(\sum \epsilon_n a_n\) converges a.s. \(\iff \sum a_n^2 < \infty\)
    • And \(\sum \epsilon_n a_n\) oscillates infinitely if \(\sum a_n^2 = \infty\)
  • Sketch
    • Note that \(Var(\epsilon_k a_k) = a_k^2\) and \(|\epsilon_k a_k| \le \sup_n a_n\), theorem 12.2.1 will yield the first part
      • \(\sup_n a_n < \infty\) because we are given \(\sum a_n^2 = \infty\)
    • For the second part, my guess is since \(\sum a_n^2 = \infty\), \(\sum \epsilon_n a_n\) will not converge
    • However, as \(\epsilon_n\) are Rademacher RVs, \(\sum \epsilon_n a_n\) will oscillate depending on the realization

Symmetrization: expanding the sample space

  • What if the mean of RVs is non-zero?
  • Lemma 12.4.1
    • Suppose \(\{X_n\}\) is a sequence of independent RVs bounded by a constant \(K \in [0,\infty)\)
    • Then \(\sum X_n\) converges a.s. implies that \(\sum E(X_n)\) converges and \(\sum Var(X_n) < \infty\)
  • Proof
    • If \(E(X_n)=0, \forall n\), then this reduce to theorem 12.2.1
    • Otherwise we need to replace each \(X_n\) by a “symmetrized version” \(Z_n^*\) of mean 0
    • Let \(\big(\tilde{\Omega}, \tilde{\mathcal{F}}, \tilde{\mathbb{P}}, (\tilde{X}_n: n \in \mathbb{N}) \big)\) be an exact copy of \(\big(\Omega, \mathcal{F}, \mathbb{P}, (X_n: n \in \mathbb{N}) \big)\)
    • Define a richer probability space \(\big(\Omega^*, \mathcal{F}^*, \mathbb{P}^* \big) := \big(\Omega, \mathcal{F}, \mathbb{P} \big) \times \big(\tilde{\Omega}, \tilde{\mathcal{F}}, \tilde{\mathbb{P}} \big)\)
    • For \(\omega^* = (\omega, \tilde{\omega}) \in \Omega\), define \[ X_n^*(\omega^*) := X_n(\omega), \tilde{X}_n^*(\omega^*) := \tilde{X}_n(\tilde{\omega}), Z_n^*(\omega^*) := X_n^*(\omega^*) -\tilde{X}_n^*(\omega^*) \]
      • Intuition: \(X_n^*\) is \(X_n\) lifted to the richer probability space

  • Proof (continue)
    • It is clear that the combined family \((X_n: n \in \mathbb{N}) \cup (\tilde{X}_n: n \in \mathbb{N})\) is on \(\big(\Omega^*, \mathcal{F}^*, \mathbb{P}^* \big)\)
      • This may be proved by the uniqueness lemma in 1.6
    • Both \(X_n^*, \tilde{X}_n^*\) having the same \(\mathbb{P}^*\)-distribution as the \(\mathbb{P}\)-distribution of \(X_n\) \[ \mathbb{P}^* \circ (X_n^*)^{-1} = \mathbb{P} \circ X_n^{-1} \textrm{ on } (\mathbb{R}, \mathcal{B}), \textrm{etc.} \]
    • Now \((Z_n^*: n \in \mathbb{N}^*)\) is a zero-mean sequence of independent RVs on \(\big(\Omega^*, \mathcal{F}^*, \mathbb{P}^* \big)\)
    • We have \(|Z_n^*(\omega^*)| \le 2K, \forall n, \forall \omega^*\) and \(Var(Z_n^*) = 2 \sigma_n^2\) where \(\sigma_n^2 := Var(X_n)\)
      • This is probably due to independence of original RV and its copy
    • Let \(G := \{ \omega \in \Omega: \sum X_n(\omega) \textrm{ converges} \}\) with \(\tilde{G}\) defined similarly
    • Since \(\mathbb{P}(G) =\tilde{\mathbb{P}}(\tilde{G}) =1\), \(\mathbb{P}^*(G \times \tilde{G})=1\)
    • But \(\sum Z_n^*(\omega^*)\) also converges on \(G \times \tilde{G}\), which means \(\mathbb{P}^*(\sum Z_n^* \textrm{ converges})=1\)
    • As \(Z_n^*\) converges a.s., is zero-mean and bounded, theorem 12.2.1 yields \(\sum \sigma_n^2 < \infty\)
    • It also follows that \(\sum [X_n -E(X_n)]\) and \(\sum E(X_n)\) converges a.s.

Some lemmas on real numbers

Cesàro’s lemma

  • Alternative version of Stolz–Cesàro theorem

  • Suppose that \(\{b_n\}\) is a sequence of strictly positive real numbers with \(b_0:=0\) and \(b_n \uparrow \infty\)

  • \(\{v_n\}\) is a convergent sequence of real numbers with \(v_n \rightarrow v_\infty \in \mathbb{R}\)

  • Then we have \(\lim_{n \rightarrow \infty} \frac{1}{b_n} \sum_{k=1}^n (b_k -b_{k-1}) v_k = v_\infty\)

  • Proof: let \(\epsilon > 0\). Choose \(N\) s.t. \(v_k > v_\infty -\epsilon\) whenever \(k \ge N\). Then \[ \begin{aligned} \liminf_{n \rightarrow \infty} \frac{1}{b_n} \sum_{k=1}^n (b_k -b_{k-1}) v_k &\ge \liminf_{n \rightarrow \infty} \left[ \frac{1}{b_n} \sum_{k=1}^N (b_k -b_{k-1}) v_k +\frac{b_n -b_N}{b_n} (v_\infty -\epsilon) \right] \\ &\ge 0 +v_\infty -\epsilon \end{aligned} \]

  • Since this is true for every \(\epsilon > 0\), we have \(\liminf \ge v_\infty\)

  • By a similar argument, we have \(\limsup \le v_\infty\) and the result follows

Kronecker’s lemma

  • Suppose that \(\{b_n\}\) is a sequence of strictly positive real numbers with \(b_n \uparrow \infty\)

  • \(\{x_n\}\) is a sequence of real numbers and define \(s_n := \sum_{i=1}^n x_i\)

  • Then we have \(\sum \frac{x_n}{b_n}\) converges \(\implies \frac{s_n}{b_n} \rightarrow 0\)

  • Proof: let \(u_n := \sum_{k \le n} \frac{x_k}{b_k}\) so that \(u_\infty := \lim_{n \rightarrow \infty} u_n\) exists

  • Then \(u_n -u_{n-1} = \frac{x_n}{b_n}\). Thus by rearrangement \[ s_n = \sum_{k=1}^n b_k(u_k -u_{k-1}) = b_n u_n -\sum_{k=1}^n (b_k-b_{k-1}) u_{k-1} \]

  • Applying Cesàro’s lemma, we have \(\frac{s_n}{b_n} \rightarrow u_\infty -u_\infty = 0\)

  • Alternative version: \(\sum x_n\) exists and is finite \(\implies \lim_{n \rightarrow \infty} \frac{1}{b_n} \sum_{k=1}^n b_k x_k =0\)

    • Check the little o of a weighted sum with monotonically increasing weights

Some neat proofs of classical theorems

Kolmogorov’s Three-Series Theorem

  • Let \(\{X_n\}\) be a sequence of independent RVs
  • Then \(\sum X_n\) converges a.s. iff for some (then for every) \(K>0\), the following 3 properties hold:
    • \(\sum_n P(|X_n| > K) < \infty\)
    • \(\sum_n E(X_n^K)\) converges
    • \(\sum_n Var(X_n^K) < \infty\) where \[ \begin{aligned} X_n^K(\omega) &:= \left\{ \begin{array}{ll} X_n(\omega) &, |X_n(\omega)| \le K \\ 0 &, |X_n(\omega)| > K \end{array} \right. \end{aligned} \]
  • Proof of “only if” part
    • Suppose that \(\sum X_n\) converges a.s. and \(K\) is any constant in \((0,\infty)\)
    • Since \(X_n \rightarrow 0\) a.s. whence \(|X_n| > K\) for only finitely many n, BC2 shows the first property holds
      • BC2: \(\sum P(|X_n| > K)=\infty \implies P(|X_n| > K, \textrm{ i.o.})=1\)
      • Contraposition: \(P(|X_n| > K, \textrm{ i.o.})=0 \implies \sum P(|X_n| > K)<\infty\)
    • Since (a.s.) \(X_n = X_n^K\) for all but finitely many \(n\), \(\sum X_n^K\) also converges a.s.
    • Applying lemma 12.4.1 yields the other two properties

  • Proof of “if” part
    • Suppose that for some \(K>0\) the 3 properties hold
    • Then \(\sum P(X_n \ne X_n^K) = \sum P(|X_n| > K) < \infty\) by construction and property 1
    • Applying BC1 yields \(P(X_n = X_n^K \textrm{ for all but finitely many } n) = 1\)
    • So we only need to check \(\sum X_n^K\) converges a.s.
    • By property 2, we can check if \(\sum \left[ X_n^K -E(X_n^K) \right]\) converges a.s. instead
    • Now note that \(Y_n^K := X_n^K -E(X_n^K)\) is a zero-mean RV with \(E\left[ (Y_n^K)^2 \right] = Var(X_n^K)\)
    • By property 3, the result follows from theorem 12.2.1

A Strong Law under variance constraints

  • Lemma 12.8.1
    • Let \(\{W_n\}\) be a sequence of independent RVs with \(E(W_n)=0, \sum \frac{Var(W_n)}{n^2} < \infty\)
    • Then \(\frac{1}{n} \sum_{k \le n} W_k \stackrel{a.s.}{\rightarrow} 0\)
  • Proof
    • By Kronecker’s lemma, it suffices to prove that \(\sum \frac{W_n}{n}\) converges
    • However \(E \left( \frac{W_n}{n} \right) = 0, \sum Var \left( \frac{W_n}{n} \right) = \sum \frac{Var(W_n)}{n^2} < \infty\)
    • So by theorem 12.2.1, the statement is proved

Kolmogorov’s Truncation Lemma

  • Suppose that \(X_1, X_2, \dots\) are iid RVs with the same distribution as \(X\) where \(E(|X|) < \infty\)
  • Define \[ \mu := E(X), Y_n := \left\{ \begin{array}{ll} X_n &, |X_n| \le n \\ 0 &, |X_n| > n \end{array} \right. \]
  • Then
    • \(E(Y_n) \rightarrow \mu\)
    • \(P(Y_n = X_n \textrm{ eventually}) = 1\)
    • \(\sum \frac{Var(Y_n)}{n^2} < \infty\)

  • Proof of \(E(Y_n) \rightarrow \mu\)
    • Let \[ Z_n := \left\{ \begin{array}{ll} X &, |X| \le n \\ 0 &, |X| > n \end{array} \right. \]
    • Then \(Z_n \stackrel{d}{=} Y_n\) and \(E(Z_n)=E(Y_n)\)
    • When \(n \rightarrow \infty\), we have \(Z_n \rightarrow X, |Z_n| \le |X|\)
    • Applying dominated convergence theorem (note that \(X\) is integrable by assumption): \[ \lim_{n \rightarrow \infty} E(Y_n) = \lim_{n \rightarrow \infty} E(Z_n) = E(X) = \mu \]

  • Proof of \(P(Y_n = X_n \textrm{ eventually}) = 1\)
    • Note that \[ \begin{aligned} \sum_{n=1}^\infty P(Y_n \ne X_n) &= \sum_{n=1}^\infty P(|X_n| > n) = \sum_{n=1}^\infty P(|X| > n) \\ &= E\left( \sum_{n=1}^\infty I_{|X| > n} \right) = E\left( \sum_{1 \le n < |X|} 1 \right) \\ &\le E(|X|) < \infty \end{aligned} \]
    • By BC1, \(P(Y_n \ne X_n, \textrm{ i.o}) = 0\). In other words, \(P(Y_n = X_n, \textrm{ e.v.}) = 1\)

  • Proof of \(\sum \frac{Var(Y_n)}{n^2} < \infty\)
    • We have \[ \sum \frac{Var(Y_n)}{n^2} \le \sum \frac{E(Y_n^2)}{n^2} = \sum_n \frac{E(|X|^2 ; |X| \le n)}{n^2} = E\left[ |X|^2 f(|X|) \right] \]
      • where \(f(z) = \sum_{n \ge \max(1,z)} \frac{1}{n^2}, 0 < z < \infty\)
    • Note that, for \(n \ge 1\), \(\frac{1}{n^2} \le \frac{2}{n(n+1)} = 2 \left( \frac{1}{n} -\frac{1}{n+1} \right)\)
    • Hence \(f(z) \le \frac{2}{\max(1,z)}\) by telescoping
    • We have \(\sum \frac{Var(Y_n)}{n^2} \le 2E(|X|) < \infty\)

Kolmogorov’s Strong Law of Large Numbers

  • Let \(X_1, X_2, \dots\) be iid RVs with \(E(|X_k|) < \infty, \forall k\). Define \(S_n := \sum_{k=1}^n X_k\) and \(\mu := E(X_k), \forall k\)

  • Then \(\frac{1}{n} S_n \stackrel{a.s.}{\rightarrow} \mu\)

  • Proof

    • Define \(Y_n\) as in Kolmogorov’s Truncation Lemma
    • By \(P(Y_n = X_n, \textrm{ e.v.}) = 1\), it suffices to show that \(\frac{1}{n} \sum_{k=1}^n Y_k \stackrel{a.s.}{\rightarrow} \mu\)
    • Define \(W_k := Y_k -E(Y_k)\). Note that \[ \frac{1}{n} \sum_{k=1}^n Y_k = \frac{1}{n} \sum_{k=1}^n E(Y_k) +\frac{1}{n} \sum_{k=1}^n W_k \]
    • The first term \(\frac{1}{n} \sum_{k=1}^n E(Y_k) \rightarrow \mu\) by \(E(Y_n) \rightarrow \mu\) and Cesàro’s lemma (let \(b_n := n\))
    • The second term \(\frac{1}{n} \sum_{k=1}^n W_k \stackrel{a.s.}{\rightarrow} 0\) by \(\sum \frac{Var(Y_n)}{n^2} < \infty\) and lemma 12.8.1

Some remarks on SLLN

  • Philosophy
    • SLLN gives a precise formulation of \(E(X)\) as “the mean of a large number of independent realizations of X”
      • Long run guarantee of frequentist method
    • From exercise E4.6, it can be shown that if \(E(|X|)=\infty\), then \(\limsup \frac{S_n}{n} = \infty\) almost surely
    • Hence SLLN is the best possible result for iid RVs
  • Methodology
    • The truncation technique seems “ad hoc” with no pure-mathematical elegance
    • The proof with martingale or ergodic theory possess that
    • However, each of the methods can be adapted to cover situations which the others cannot tackle
    • Classical truncation arguments retain great importance

Decomposition of stochastic process

Doob decomposition

  • Theorem 12.11.1
    • Let \(\{X_n\}_{n \in \mathbb{Z^+}}\) be an adapted process in \(\mathcal{L}^1\)
    • Then \(X\) has a Doob decomposition \(X = X_0 +M +A\)
      • where \(M\) is a martingale null at \(0\) and \(A\) is a previsible process null at \(0\)
    • Moreover, this decomposition is unique modulo indistinguishability in the sense that \[ X = X_0 +\tilde{M} +\tilde{A} \implies P(M_n =\tilde{M}_n, A_n =\tilde{A}_n, \forall n) = 1 \]
    • Continuous time analogue: Doob-Meyer decomposition
  • Corollary 12.11.2
    • \(X\) is a submartingale iff \(A\) is an increasing process in the sense that \(P(A_n \le A_{n+1}, \forall n) = 1\)
    • Similarly, \(X\) is a supermartingale if and only if \(A\) is almost surely decreasing

  • Proof of existence
    • If \(X\) has Doob decomposition \(X = X_0 +M +A\), we have \[ \begin{aligned} E(X_n -X_{n-1} | \mathcal{F}_{n-1}) &= E(M_n -M_{n-1} | \mathcal{F}_{n-1}) +E(A_n -A_{n-1} | \mathcal{F}_{n-1}) \\ &= 0 +(A_n -A_{n-1}) \end{aligned} \]
    • Hence we can define \(A\) by \(A_n = \sum_{k=1}^n E(X_k -X_{k-1} | \mathcal{F}_{n-1})\) a.s.
      • \(A\) represents the sum of expected increments of \(X\)
      • \(M\) can be defined by \(M_n = \sum_{k=1}^n \left[ X_k -E(X_k | \mathcal{F}_{k-1}) \right]\), which adds up the surprises
    • Corollary is now obvious by the defintion of \(A\)
  • Proof of uniqueness
    • Define \(Y := M-\tilde{M} = A -\tilde{A}\) by rearranging the other decomposition
    • The first equality implies that \(Y\) is a martingale and \(E(Y_n | \mathcal{F}_{n-1}) = Y_{n-1}\) a.s.
    • The second equality implies that \(Y\) is also previsible and \(E(Y_n | \mathcal{F}_{n-1}) = Y_n\) a.s.
    • Since \(Y_0 = 0\) by construction, this implies that \(Y_n = 0\) a.s.
    • which also means that the decomposition is almost surely unique

The angle-brackets process \(\langle M \rangle\)

  • Let \(M\) be a martingale in \(\mathcal{L}^2\) and null at \(0\)
  • The the conditional form of Jensen’s inequality shows that \(M^2\) is a submartingale
    • Square function is convex as the second derivative is non-negative
    • \(E(M_n^2|\mathcal{F}_{n-1}) \ge \left[ E(M_n|\mathcal{F}_{n-1}) \right]^2 = M_{n-1}^2\)
  • Thus \(M^2\) has a Doob decomposition \(M^2 = N +A\)
    • where \(N\) is a martingale null at \(0\) and \(A\) is a previsible increasing process null at \(0\)
    • \(A\) is often written as \(\langle M \rangle\) (quadratic variation in stochastic calculus)
  • Since \(E(M_n^2) = E(A_n)\), \(M\) is bounded in \(\mathcal{L}^2 \iff E(A_\infty) < \infty\)
    • where \(A_\infty := \uparrow \lim A_n\), a.s.
    • \(E(N) = E\left[ E(N | \mathcal{F}_0) \right] = 0\) (martingale property)
  • It is important to note that \(A_n -A_{n-1} = E(M_n^2 -M_{n-1}^2 | \mathcal{F}_{n-1}) = E\left[ (M_n -M_{n-1})^2 | \mathcal{F}_{n-1} \right]\)
    • As the cross term is \(-E(2M_n M_{n-1}|\mathcal{F}_{n-1}) = -2M_{n-1}^2\)

Relating convergence of \(M\) to finiteness of \(\langle M \rangle_\infty\)

  • Theorem 12.13.1
    • Let \(M\) be a martingale in \(\mathcal{L}^2\) and null at \(0\). Let \(A\) be “a version of” \(\langle M \rangle\)
    • Then \(A_\infty (\omega) < \infty \implies \lim_{n \rightarrow \infty} M_n(\omega)\) exists
    • Suppose that \(M\) has uniformly bounded increments in that for some \(K \in \mathbb{R}\), \[ |M_n(\omega) -M_{n-1}(\omega)| \le K, \forall n, \forall \omega \]
    • Then \(\lim_{n \rightarrow \infty} M_n(\omega)\) exists \(\implies A_\infty (\omega) < \infty\)
  • Remark
    • Theorem 12.13.1 is an extension of 12.2.1
      • Doob convergence theorem + 12.2.1 with different conditions

  • Proof of \(A_\infty (\omega) < \infty \implies \lim_{n \rightarrow \infty} M_n(\omega)\) exists
    • Since \(A\) is previsible, \(S(k) := \inf \big\{ n \in \mathbb{Z^+}: A_{n+1} > k \big\}\) is a stopping time for every \(k \in \mathbb{N}\)
    • The stopped process \(A^{S(k)}\) is also previsible because for \(B \in \mathcal{B}, n \in \mathbb{N}\) \[ \big\{ A_{n \land S(k)} \in B \big\} = F_1 \cup F_2 \]
      • where \(F_1 := \cup_{r=0}^{n-1} \big\{ S(k)=r; A_r \in B \big\} \in \mathcal{F}_{n-1}\) (case \(S(k) \le n\))
      • and \(F_2 := \big\{ A_n \in B \big\} \cap \big\{ S(k) \le n-1 \big\}^c \in \mathcal{F}_{n-1}\) (case \(S(k) > n\))
    • Since \(\left( M^{S(k)} \right)^2 -A^{S(k)} = (M^2-A)^{S(k)}\) is a martingale, we have \(\langle M^{S(k)} \rangle = A^{S(k)}\)
      • Why this is not true by definition?
    • As \(A^{S(k)}\) is bounded by \(k\), \(M^{S(k)}\) is bounded in \(\mathcal{L}^2\) by the third property in 12.2
    • Thus \(\lim_n M_{n \land S(k)}\) exists almost surely by Doob convergence theorem
    • However, \(\big\{ A_\infty < \infty \big\} = \cup_k \big\{ S(k)=\infty \big\}\)
    • The result now follows on combining \(\lim_n M_{n \land S(k)}\) and \(\big\{ A_\infty < \infty \big\}\)

  • Proof of \(\lim_{n \rightarrow \infty} M_n(\omega)\) exists \(\implies A_\infty (\omega) < \infty\)
    • Suppose that \(P(A_\infty = \infty, \sup_n |M_n| < \infty) > 0\)
    • Then for some \(c>0\), \(P[T(c) = \infty, A_\infty = \infty] > 0\) (since \(M_n\) is bounded)
      • where \(T(c) := \inf \big\{ r : |M_r| > c \big\}\) is a stopping time
    • Now \(E\left[ M_{T(c) \land n}^2 -A_{T(c) \land n} \right] = 0\) and \(M^{T(c)}\) is bounded by \(c+K\)
      • The first one comes from decomposition and martingale property
      • The second one comes from the given condition and idea of upcrossing
    • Thus \(E\left[ A_{T(c) \land n} \right] \le (c+K)^2, \forall n\), which implies \(E(A_\infty) < \infty\)
    • Contradication arises so we should have \(P(A_\infty = \infty, \sup_n |M_n| < \infty) = 0\)
  • Remarks
    • The additional assumption of uniformly bounded increments of \(M\) is needed for upcrossing
    • For \(A\), this is not necessary as the jump \(A_{S(k)} - A_{S(k)-1}\) becomes irrelevant due to previsibility

A trivial “Strong Law” for martingales in \(\mathcal{L}^2\)

  • Let \(M\) be a martingale in \(\mathcal{L}^2\) and null at \(0\). Let \(A\) be “a version of” \(\langle M \rangle\)
  • Since \((1+A)^{-1}\) is a bounded previsible process, we can define a martingale \[ W_n := \sum_{k=1}^n \frac{M_k -M_{k-1}}{1 +A_k} = \left[ (1+A)^{-1} \bullet M \right]_n \]
  • Moreover, since \((1+A_n)\) is \(\mathcal{F}_{n-1}\) measurable, \[ \begin{aligned} E\left[ (W_n -W_{n-1})^2 | \mathcal{F}_{n-1} \right] &= (1+A_n)^{-2} (A_n -A_{n-1}) \\ &\le (1+A_{n-1})^{-1} -(1+A_n)^{-1}, \textrm{ a.s.} \end{aligned} \]
  • We see that \(\langle W \rangle_\infty \le 1\) so \(\lim W_n\) exists a.s. by theorem 12.13.1
  • Applying Kronecker’s lemma shows that \(\frac{M_n}{A_n} \rightarrow 0\) almost surely on \(\{A_\infty = \infty\}\)

Lévy’s extension of the Borel-Cantelli Lemmas

  • Theorem 12.15.1
    • Suppose that for \(n \in \mathbb{N}, E_n \in \mathcal{F}_n\)
    • Define \(Z_n := \sum_{k=1}^n I_{E_k} =\) number of \(E_k(k \le n)\) which occur
    • Also define \(\xi_k := P(E_k|\mathcal{F}_{k-1})\) and \(Y_n := \sum_{k=1}^n \xi_k\)
    • Then we have \(\{Y_\infty < \infty\} \implies \{Z_\infty < \infty\}\) almost surely
    • And \(\{Y_\infty = \infty\} \implies \{\frac{Z_n}{Y_n} \rightarrow 1\}\) almost surely
  • Extension of BC1
    • Since \(E(\xi_k)=P(E_k)\), it follows that if \(\sum P(E_k) < \infty\) then \(Y_\infty < \infty\) a.s. and BC1 follows
  • Extension of BC2
    • Let \(\{E_n\}_{n \in \mathbb{N}}\) be a sequence of independent events associated with some triple \((\Omega, \mathcal{F}, \mathbb{P})\)
    • Define the natural filtration \(\mathcal{F}_n = \sigma (E_1, E_2, \dots, E_n)\)
    • Then \(\xi_k = P(E_k)\) almost surely by independence
    • BC2 follows from \(\{Y_\infty = \infty\} \implies \{\frac{Z_n}{Y_n} \rightarrow 1\}\) a.s.

  • Proof
    • Let \(M\) be the martingale \(Z-Y\), so that \(Z = M +Y\) is the Doob decomposition of \(Z\). Then \[ \begin{aligned} M_n &= Z_n -Y_n = \sum_{k=1}^n \left[ I_{E_k} -\xi_k \right] \\ A_n := \langle M \rangle_n &= \sum_{k=1}^n E\left[ (M_k -M_{k-1})^2 | \mathcal{F}_{k-1} \right] = \sum_{k=1}^n E\left[ (I_{E_k} -\xi_k )^2 | \mathcal{F}_{k-1} \right] \\ &= \sum_{k=1}^n E\left[ I_{E_k} -2I_{E_k} \xi_k +\xi_k^2 | \mathcal{F}_{k-1} \right] = \sum_{k=1}^n \xi_k (1 -\xi_k) \le Y_n, \textrm{ a.s.} \end{aligned} \]
      • Note that \(E(I_{E_k}|\mathcal{F}_{k-1}) = P(E_k|\mathcal{F}_{k-1}) =: \xi_k\)
    • If \(Y_\infty < \infty\), then \(A_\infty < \infty\) and \(\lim M_n\) exists so that \(Z_\infty\) is finite almost surely
    • If \(Y_\infty = \infty\) and \(A_\infty < \infty\), then \(\lim M_n\) still exists and \(\frac{Z_n}{Y_n} \rightarrow 1\) almost surely
    • If \(Y_\infty = \infty\) and \(A_\infty = \infty\), then \(\frac{M_n}{A_n} = \frac{M_n}{M_n^2 +N} \rightarrow 0\) almost surely
    • Hence, a fortiori, \(\frac{M_n}{Y_n} \rightarrow 0\) and \(\frac{Z_n}{Y_n} = \frac{M_n+Y_n}{Y_n} \rightarrow 1\) almost surely
      • A fortiori means “from the stronger argument”

Concluding remarks


  • Independence is important in the study of RVs
  • Martingale may relax the independent RVs assumption to orthogonal increments
    • Pythagorean formula in \(\mathcal{L}^2\)
    • Richer probability space for copy of independent RVs
    • Doob decomposition for expected increment and surprise
  • Martingale also relates convergence with finiteness
    • Doob convergence theorem
    • Truncation technique with stopping time
    • \(\langle M \rangle\) from decomposition of \(M^2\)
  • Martingale transform is a possible candidate for control variate in variance reduction
    • Suppose \(\{X_n\}_{n \in \mathbb{N}}\) is a martingale wrt natural filtration \(\mathcal{F}_n\)
    • \(Y_{n+1} := \sum_{i=1}^n g_i(X_1,\dots,X_i)(X_{i+1}-X_i)\) is also a martingale wrt \(\mathcal{F}_n\)
    • Choose \(g\) with high correlation to use \(Y_n\) as control variate
    • See a trivial “Strong Law” for an example of martingale transform