# Motivating Example

Sometimes my family needs to quickly load up into the car, which may involve putting on shoes, using the restroom, getting dressed, etc. Suppose that $$X$$ is the time in minutes that it takes for the last of my five kids to get in the car. Specifically, $$X = \max\{Y_1,\ldots,Y_n\}$$ where $$\{Y_i\}_{i=1}^5$$ are independent exponentially distributed random variables with rates 0.5, 0.75, 1.0, 0.75, and 2.0. (Aside: The independence assumption is suspect, but let’s go with it here.)

Find the probability that it takes more than 3 minutes to load up.

We could approach this as a mathematical statistics problem, trying to find the density function $$p$$ for $$X$$ and then computing:

$\text{Pr}(X>3) = \int_3^\infty p(x) dx = \int_{-\infty}^\infty \text{I}\{ X > 3 \} p(x) dx = \text{E}( \text{I} \{ X > 3 \} ).$

But there is no known simple solution for finding $$p$$ in this case. Even if we can find $$p$$, we may still have a challenge in computing the integrate analytically.

Instead, let’s solve this problem using Monte Carlo integration.

# Theory of Monte Carlo Integration

First, let’s define Monte Carlo integration in the abstract.

Let $$X$$ be a random variable having density $$p$$.

Suppose we want to calculate:

$\theta = \text{E}(h(X)) = \int h(x) p(x) dx,$

i.e., the expected value of a known function $$h$$ of the random variable $$X$$ having density $$p$$.

Monte Carlo integration approximates $$\theta$$ by obtaining random samples $$X_1,\ldots,X_B$$ from the distribution having density $$p$$, where $$B$$ is a “big” number. Let

$\hat{\theta} = \frac{1}{B} \sum_{i=1}^B h(x_i).$

Note that $$\text{E}(\hat{\theta}) = \theta.$$ Further, by the strong law of large numbers (LLN),

$\hat{\theta} = \frac{1}{B} \sum_{i=1}^B h(X_i) \longrightarrow_{a.s.} \theta, \ \ \text{as} \ B \rightarrow \infty$

Thus, for finite $$B$$, $$\hat{\theta}$$ estimates $$\theta$$. That is, $$\hat{\theta} \approx \theta$$. How good is this estimate? Will I get the same answer in repeated calculations? What is the Monte Carlo error? What is the distribution of $$\hat{\theta}$$?

Since $$\hat{\theta}$$ is a mean calculated from a large random sample size, the central limit theorem (CLT) says that $$\hat{\theta}$$ is approximately normally distributed with mean $$\theta$$. Thus, we can put a confidence interval on our Monte Carlo estimate. We already know that $$\text{E}(\hat{\theta}) = \theta$$, but what is the variance?

\begin{align*} \text{var}(\hat{\theta}) & = \text{var}\bigg(\frac{1}{B} \sum_{i=1}^B h(X_i)\bigg) \\ & = \frac{1}{B^2} \ B \ \text{var}(h(X)) \\ & = \frac{1}{B} \ \text{E}((h(X)-\theta)^2) \\ & \approx \frac{1}{B} \ \frac{1}{B} \ \sum_{i=1}^B (h(x_i)-\theta)^2 \\ & \approx \frac{1}{B(B-1)} \ \sum_{i=1}^B (h(x_i)-\hat{\theta})^2 \\ & = \frac{s^2}{B}, \\ \end{align*}

where $$s^2$$ is the sample variance of $$h(x_1),\ldots,h(x_B)$$.

Finally, recall that many quantities can be expressed in terms of expectations. For example, $$\text{Pr}(X>\tau) = \text{E}(\text{I}\{X>\tau\})$$.

# Application to Example

Here $$h(x) = \text{I}\{ X>3 \}$$.

B <- 100000
times <- sapply(c(0.5, 0.75, 1.0, 0.75, 2.0),function(rate) rexp(B,rate))
x <- apply(times,1,max)
est <- mean(x>3)
cat(sprintf("Monte Carlo estimate is %4.3f.\n",est))
## Monte Carlo estimate is 0.412.
ci <- est + c(-1,1)*qnorm(0.975)*sd(x)/sqrt(B)
cat(sprintf("The 95%% confidence interval is (%4.3f,%4.3f).\n",ci,ci))
## The 95% confidence interval is (0.400,0.423).

If this confidence interval is too wide for your tastes, just increase $$B$$.

What about getting the mean waiting time? There $$h$$ is just the identity function, i.e., $$h(x) = x$$.

est <- mean(x)
cat(sprintf("Monte Carlo estimate is %4.3f.\n",est))
## Monte Carlo estimate is 3.062.
ci <- est + c(-1,1)*qnorm(0.975)*sd(x)/sqrt(B)
cat(sprintf("The 95%% confidence interval is (%4.3f,%4.3f).\n",ci,ci))
## The 95% confidence interval is (3.051,3.074).