Summer 2021
From the book:
play with chances, either large (in terms of convergence probability) or small (in terms of winning a big prize at the Casino) chances.
From Wikipedia:
use randomness to solve problems that might be deterministic in principle.
My additional opinion:
learn through generalization.
Difference of “Monte Carlo” in statistics and computer science:
From my STAT3005 Q&A:
For approximation of \(\pi\):
Suppose we wish to evaluate \(\int f(x) \ \mathrm{d}x\), where \(f(x) = g(x) p(x)\). Then \[ \int_{-\infty}^\infty f(x) \ \mathrm{d}x = \int_{-\infty}^\infty g(x) p(x) \ \mathrm{d}x = \mathbb{E}\{ g(X) \}, \] where \(X \sim p\), i.e., \(X\) is a random variable whose pdf is \(p\). Under suitable conditions, the SLLN gives us \[ \bar{g}_n := \frac{1}{n} \sum_{i=1}^n g(X_i) \stackrel{\mathrm{a.s.}}{\to} \int_{-\infty}^\infty f(x) \ \mathrm{d}x, \] where \(X_1, \ldots, X_n \stackrel{\mathrm{iid}}{\sim} p\). Some notable difficulties:
The book chose to introduce the Gibbs sampler first
I do not agree
Why MCMC instead of classical MC?
What if \(p(\cdot)\) is a \(d\)-dimensional pdf?
(Theorem 15.1) Suppose that the Markov chain has transition kernel \(K\) and stationary distribution \(\pi\) so that \(K\) is \(\pi\)-irreducible and aperiodic. Then, for all \(x \in D=\{x: \pi(x) > 0\}\), the following hold:
(Theorem 15.2) The conditions of Theorem 15.1 hold for the Gibbs sampler provided that \(F\) is lower semicontinuous at zero, \(D\) is connnected, and both \(f_X(\cdot)\) and \(f_Y(\cdot)\) are locally bounded.
Suppose we simulate \(\boldsymbol{X} = (X^{(1)}, X^{(2)}, X^{(3)})^\intercal\) with Gibbs. Some possible strategies are:
Which strategy is the best?
How to prove the inequality \(\left\lVert\mathbf{F}_{\mathrm{collapse}}\right\rVert \le \left\lVert\mathbf{F}_{\mathrm{group}}\right\rVert \le \left\lVert\mathbf{F}_{\mathrm{direct}}\right\rVert\)?
Briefly discussed in my STAT4010 tutorial:
Expectation–maximization (EM) algorithm
To obtain MLE, we need to optimize the likelihood function, which may not have closed form in some models. In that case, we may replace the expectation step in the EM algorithm with a MCMC procedure; see the description here.
Stochastic approximation (SA) algorithm
Instead of using MCMC and Riemann sum to approximate an expectation, we can use MCMC to generate the required random effect in the SA algorithm; see the description here. (Perhaps the most well-known SA algorithm is the stochastic gradient descent)
Machine learning
MC or MCMC methods can be used to enhance or even construct machine learning algorithms. For example, the algorithm behind AlphaGo includes Monte Carlo tree search.