Consider the computation of sample mean \(\bar{X}_n = n^{-1} \sum_{i=1}^n X_i\) where:
There are two ways to compute \(\bar{X}_n\) when \(X_n\) arrives:
We confirm the finite-sample performance in R:
barX_nr = function(x){
mean(x)
}
barX_r = function(x, n, barX){
(n*barX+x)/(n+1)
}
n = 30000
x = runif(n)
out = matrix(nrow=n, ncol=2)
system.time({
for (i in 1:n)
out[i,1] = barX_nr(x[1:i])
})["elapsed"] # speed of non-recursive update
## elapsed
## 2.61
system.time({
out[1,2] = x[1]
for (i in 2:n)
out[i,2] = barX_r(x[i], i-1, out[i-1,2])
})["elapsed"] # speed of recursive update
## elapsed
## 0.05
all.equal(out[,1], out[,2]) # the running estimates are the same
## [1] TRUE
Note that the above setting appears frequently with the use of learning algorithms, e.g., Markov chain Monte Carlo (MCMC, which will be taught in the next lecture) and stochastic gradient descent (SGD). In particular, it is crucial to diagnose the convergence (e.g., using Central Limit Theorem) in order to stop the learning procedure at a reasonable point in time with confidence. Nevertheless, while the recursive formula for \(\bar{X}_n\) is well-known, the recursive formula for \(\mathrm{Var}(\bar{X}_n)\) is lesser-known (especially when \(X_i\) are serially dependent).
As we have not covered the long-run variance (see Remark 7.6) in the lecture yet, we are not going to discuss its recursive formula this time. Instead, I will go through the derivation of the recursive formula for the sample variance (i.e., \(\mathrm{Var}(\bar{X}_n)\) when \(X_i\) are independent) to get you familiar with recursive estimation. To begin with, the non-recursive formula of the sample variance is \[ s_n^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i -\bar{X}_n)^2. \] Updating the above in \(O(1)\) time is non-trivial because \(\bar{X}_n\) changes when new data arrives. In light of it, we try to decompose the sum into several basic components: \[ \begin{aligned} \sum_{i=1}^n (X_i -\bar{X}_n)^2 &= \sum_{i=1}^n X_i^2 -2\bar{X}_n \sum_{i=1}^n X_i +n\bar{X}_n^2 \\ &=: Q_n -2\bar{X}_n (n\bar{X}_n) +n\bar{X}_n^2 \\ &= Q_n -n\bar{X}_n^2. \end{aligned} \] As a result, we can update \(s_n^2\) recursively as follows. At stage \(n\), we store \((n, Q_n, \bar{X}_n)\). When \(n=1\), the initial vector is \((1, X_1^2, X_1)\). At stage \(n+1\), we update the vector by:
Output: \(s_{n+1}^2 = \{Q_{n+1} -(n+1)\bar{X}_{n+1}^2\}/n\).
We verify the finite-sample performance in R:
s_nr = function(x){
var(x)
}
s_r = function(x, n, Q, barX){
Q = Q +x^2
barX = (n*barX+x)/(n+1)
s = (Q-(n+1)*barX^2)/n
c(s, n+1, Q, barX)
}
n = 30000
x = runif(n)
out = matrix(nrow=n, ncol=2)
system.time({
for (i in 2:n)
out[i,1] = s_nr(x[1:i])
})["elapsed"] # speed of non-recursive update
## elapsed
## 5.07
system.time({
comp = c(0, 1, x[1]^2, x[1])
for (i in 2:n) {
comp = s_r(x[i], comp[2], comp[3], comp[4])
out[i,2] = comp[1]
}
})["elapsed"] # speed of recursive update
## elapsed
## 0.17
all.equal(out[,1], out[,2]) # the running estimates are the same
## [1] TRUE
There are several final remarks.