# Using the bootstrap for bias reduction

I came across this simple example in Horowitz (2001, p. 3174) that demonstrates that (in these specific circumstances at least), the bias-corrected bootstrap estimator has lower MSE by a large factor. The setup is as follows. We have a sample of 10 iid observation, where $$X_{i}\sim N(0,6)$$. The goal is then to estimate $$\theta=\exp(\text{E}[X_{i}])$$, for which the true value is $$\theta=1$$. The plug-in estimator is $$\hat{\theta}=\exp\left(\frac{1}{10}\sum_{i=1}^{10}X_{i}\right)$$.

Given a realized sample $$\mathbf{x}=(x_{1},\ldots,x_{n})$$, the usual bootstrap estimates are obtained by resampling $$m$$ times from $$\mathbf{x}$$ with replacement, generating the bootstrap samples $$\mathbf{x}_{j}^{*}$$, and the bootstrap estimates $$\hat{\theta}_{j}^{*}=\exp\left(\frac{1}{10}\sum_{i=1}^{10}x_{j}^{*}\right)$$. Let $$\hat{\theta}^{*}=\frac{1}{m}\sum_{j=1}^{m}\hat{\theta}_{j}^{*}$$ be the average across all$$\hat{\theta}_{j}^{*}$$. We can then estimate the bias as $$\widehat{\text{Bias}}[\hat{\theta}]=\hat{\theta}^{*}-\hat{\theta}$$. In R code, this is:

set.seed(1)
data = rnorm(10, 0, sqrt(6))
(thetahat = exp(mean(data)))
#>  1.382411
bs = replicate(1000, {
resample = sample(data, 10, replace = TRUE)
exp(mean(resample))
})
(biashat = mean(bs) - thetahat)
#>  0.2973734

The “debiased” estimate would hence be $$\hat{\theta}-\widehat{\text{Bias}}[\hat{\theta}]=2\hat{\theta}-\hat{\theta}^{*}$$. For the concrete result, this is $$1.382-0.297=1.085$$, much closer to the true value $$\theta=1$$.

Because we control the data-generating process and know the true value of $$\theta$$, we can repeat the above procedures any number of times and obtain approximations for the MSE’s of $$\hat{\theta}$$ and $$\hat{\theta}-\widehat{\text{Bias}}[\hat{\theta}]$$. The following code accomplishes that for 100 repetitions:

res = replicate(100, {
data = rnorm(10, 0, sqrt(6))
thetahat = exp(mean(data))
bs = replicate(1000, {
resample = sample(data, 10, replace = TRUE)
exp(mean(resample))
})
(debiased = 2 * thetahat - mean(bs))
c(thetahat - 1, debiased - 1, (thetahat - 1) ^ 2, (debiased - 1)^2)
})
apply(res, 1, mean)
#>   0.37878143 -0.04919049  1.10729457  0.47833810

By making use of the identity $$\text{MSE}[\cdot]=\text{Bias}^{2}[\cdot]+\text{Var}[\cdot]$$, we obtain the following results:

Estimator MSE Bias Variance
$$\hat{\theta}$$ 1.107 0.379 0.964
$$\hat{\theta}-\widehat{\text{Bias}}[\hat{\theta}]$$ 0.478 -0.049 0.476

Similar to the results reported in Horowitz (2001, p. 3175), there is a large reduction in both bias and MSE. Not reported by Horowitz, but also significant, is the reduction in variance. The true bias1 of $$\hat{\theta}$$ is $$\exp(0.3) - 1 \approx 0.35$$, so the simulation estimate is not far off.

Horowitz, Joel L. 2001. “The Bootstrap.” In: Handbook of Econometrics, Volume 5, edited by J. J. Heckman and E. Leamer. Elsevier.

1. Let $$Y = \frac{1}{10} \sum X_i$$, then $$Y \sim N(0, 0.6)$$, and $$\hat{\theta}= \exp(Y) \sim \text{LogNormal}(0, 0.6)$$. A log-normal random variable has mean $$\exp \left( \frac{\mu + \sigma^2}{2} \right)$$, hence $$\text{E}[\hat{\theta}] = \exp(0.3)$$. ↩︎