In this blog post we are concerned with a specific problem: we have a Monte-Carlo type model that produces some simulated output. With this we want to estimate an arbitrary statistic (for example percentiles, expected shortfall or more complicated statistics relating to many variables). We know however that calculated in this way the calculated statistic is just one realisation of a distribution of outcomes. We would like to be able to say something about this distribution, in particular we would like to have some idea of the variability in the statistic.

The "obvious" (and most accurate) way to do this would be to re-run the model many times with different random seeds and create an empirical distribution of the statistic. However if the model is particularly complex this might mean a lot of compute time. Instead we would like to find an approximate method that does not require us to re-run the model at all. To do this in a general way we will introduce the Jackknife method.

It is worth noting that in certain situations other methods can be easier to implement/more accurate, for example if we only wanted to estimate the 90th percentile we can construct an argument using a binomial distribution (and normal approximation thereof) - however this method will not generalise to (for example) expected shortfall or even more complicated statistics.

Justification of the Method

We begin by supposing we have $N$ un-ordered observations $\{ X_i \}_{i=1}^{N}$ from our Monte-Carlo model. We use these observations to create an estimate of a statistic $\hat{Q}$. We denote the estimate from the model $\hat{Q}_{\{1:N\}}$, which itself is a random variable. Through a Taylor expansion we can note: $$ \mathbb{E}\left(\hat{Q}_{\{1:N\}}\right) = \hat{Q} + \frac{a_1}{N} + \frac{a_2}{N^2} + ...$$ For some constants $a_x$.

If we now consider partitioning the $N$ observations as: $N = mk$ for integers $m$ and $k$ - that is we create $m$ collections of $k$ obervations ($k \gg m$). We denote a set: $A_i = \{ X_j | \quad j < (i-1) k , \quad j \geq i k \}$ to contain all observations bar $k$, each set removes a different set of $k$ observations. Each has $|A_i| = (m-1)k$. We can then write: $$ \mathbb{E}\left(\hat{Q}_{A_i}\right) \approx \hat{Q} + \frac{a_1}{(m-1)k} + \frac{a_2}{(m-1)^2k^2} $$ Via a second order approximation.

If we then define a new variable: $\hat{q}_i = m\hat{Q}_{\{1:N\}} - (m-1)\hat{Q}_{A_i}$. Then via a second-order approximation we have: $$\mathbb{E}\left( \hat{q}_i \right) \approx m\left( \hat{Q} + \frac{a_1}{N} + \frac{a_2}{N^2}\right) - (m-1)\left(\hat{Q} + \frac{a_1}{(m-1)k} + \frac{a_2}{(m-1)^2k^2}\right)$$ Through some simplification this becomes: $$\mathbb{E}\left( \hat{q}_i \right) \approx \hat{Q} - \frac{a_2}{m(m-1)k^2} $$ Asymptotically this has bias $O(N^{-2})$. If the estimator only has bias $O(N^{-1})$ (i.e. $a_i =0$ for $i \geq 2$) then the approximation is unbiased.

We can now define two new variables:
$\hat{\hat{Q}}_N = \frac{1}{m} \sum_{i=1}^{m} \hat{q}_i$
$\hat{\hat{V}}_N = \frac{1}{m-1} \sum_{i=1}^{m} \left( \hat{q}_i - \hat{\hat{Q}}_N \right)^2$
Then via CLT we have that $\hat{\hat{Q}}_N \sim \mathcal{N}(\hat{Q}, \hat{V})$ for some unknown variance $\hat{V}$. We can thus create an $X\%$ confidence interval as:
$\hat{Q} \in \left[ \hat{\hat{Q}}_N - t \sqrt{\frac{\hat{\hat{V}}_N}{m}}, \hat{\hat{Q}}_N + t \sqrt{\frac{\hat{\hat{V}}_N}{m}} \right] $
With $t$ as the $(1-X)\%$ point of a double-tail student-t distribution with $(m-1)$ degrees of freedom.

We can see from the construction of this confidence interval there has been no restriction on the type of statistic $\hat{Q}$ used, in this sense this is a generic method.

(Note this is strongly related to a bootstrap method, in fact it is a first order approximation to a bootstrap)

An Example

The explanation above is quite notation dense, it will be easier to look at an example. In this case we will take one of the simplest examples of a Monte-Carlo model: estimating the value of $\pi$. To do this we will take a unit square and a unit quarter circle inside it:

We will simulate random points within the square and calculate the proportion $p$ of points landing within the quarter circle (red). If we simulate $N$ points we get an estimate $\pi \approx \frac{4p}{N}$. A simple vectorised numpy for this can be seen below:

# Estimating pi using Monte-Carlo
import numpy as np

def points_in_circle(N):
    """
    Returns an array that contains 1 if a random point (x,y)
    is within the unit circle of 0 otherwise
    
    N: Number of simulations (int)
    
    Random seed fixed for repeatability
    """
    np.random.seed(123)
    x = np.random.random(N)
    y = np.random.random(N)
    r = np.sqrt(x**2 + y**2)
    p = r < 1
    return p*1

def est_pi(arr):
    """
    Return an estimate of pi using the output of points_in_circle
    """
    return arr.sum() / arr.shape[0] * 4

SIMS = 10000
pts = points_in_circle(10000)
print("Estimate of pi:", est_pi(pts))

Estimate of pi: 3.1456

We can use the jackknife method to now construct a confidence interval. (Obviously in this case we have the sample estimator of an average and so CLT applies, in practice we wouldn't use a jackknife here. But for this example I wanted something simple as to not distract attention from the jackknife method itself.) We know what the result "should" be in this example, however we shall pretend we don't have access to np.pi (or similar).

We can code up an implementation of the jackknife method as:

from scipy.stats import t

def jackknife_pi(arr, pi_fn, m):
    """
    This function implements the jackknife method outlined above
    The function takes an array (arr) and an estimate function (pi_fn)
    and a number of discrete buckets (m) - in this implementation m 
    needs to divide size(arr) exactly
    
    The function returns Q-double hat, V-double hat, (m-1)
    """
    Qn = pi_fn(arr)
    N = arr.shape[0]
    itr = np.arange(N)
    k = N / m
    q = np.zeros(m)
    
    for i in range(m):
        ID = (itr < i*k) | (itr >= (i+1)*k)
        temp = arr[ID]
        q[i] = m*Qn - (m-1)*pi_fn(temp)
    
    Qjk = q.sum() / m
    v = (q - Qjk)**2
    Vjk = v.sum() / (m - 1)
    
    return Qjk, Vjk, (m-1)

def conf_int(Q, V, X, dof):
    tpt = t.ppf(1-(1-X)/2, dof)
    up = Q + tpt*np.sqrt(V/(dof+1))
    down = Q - tpt*np.sqrt(V/(dof+1))
    print(round(X*100),"% confidence interval: [",down,",",up,"]")

jk_pi = jackknife_pi(pts, est_pi, 10)
Qtest = jk_pi[0]
Vtest = jk_pi[1]
pct = 0.95
dof = jk_pi[2]
conf_int(Qtest, Vtest, 0.95, dof)

95 % confidence interval: [ 3.103649740420917 , 3.187550259579082 ]

We can see that the 95% confidence interval range is fairly large (around 3.10 to 3.19) in this case. We will now show that by using a "better" method we can reduce this range. We start by reconsidering the square-quarter-circle: we notice that we can add 2 additional squares to this setup:

Apart from looking like a Mondrian painting we can notice that all points generated in the yellow area will add "1" to the estimator array and all points in the black area will add "0". The only areas of "contention" are the blue/red rectangles, if we focus only in generating points in these areas we will increase the accuracy of the estimator. By symmetry these 2 rectangles are identical, we only need to generate points within the rectangle: $\left\{\left(1, \frac{1}{\sqrt{2}}\right), \left( 1,0\right), \left(\frac{1}{\sqrt{2}},0\right), \left(\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}\right)\right\}$. This has the area: $\frac{\sqrt{2}-1}{2}$ or both rectangles together having total area: $\sqrt{2}-1$. Therefore generating $N$ points within these rectangles is equivalent to generating: $\frac{N}{\sqrt{2}-1}$ points in the original scheme (approximately 2.5 times as many). This should reduce the standard deviation of the estimate by about a third (by CLT). We can code this estimator up in a similar way to before:

def points_in_rect(N):
    """
    Returns an array that contains 1 if a random point (x,y)
    is within the unit circle of 0 otherwise
    
    This uses the "imporved" method
    
    N: Number of simulations (int)
    
    Random seed fixed for repeatability
    """
    np.random.seed(123)
    x = np.random.random(N) * (1 - 1 / np.sqrt(2)) + (1/np.sqrt(2))
    y = np.random.random(N) / np.sqrt(2)
    r = np.sqrt(x**2 + y**2)
    p = r < 1
    return p*1

def est_pi_2(arr):
    """
    Return an estimate of pi using the output of points_in_rect
    This applies a correction since points are only simulated
    in smaller rectangles
    """
    pct = arr.sum() / arr.shape[0]
    approx_pi = (pct*(np.sqrt(2)-1) + 0.5)*4
    return approx_pi

SIMS = 10000
pts = points_in_rect(10000)
print("Estimate of pi:", est_pi_2(pts))

jk_pi = jackknife_pi(pts, est_pi_2, 10)
Qtest = jk_pi[0]
Vtest = jk_pi[1]
pct = 0.95
dof = jk_pi[2]
conf_int(Qtest, Vtest, 0.95, dof)

Estimate of pi: 3.144554915549336
95 % confidence interval: [ 3.1247314678035196 , 3.164378363295143 ]

As expected we can see the confidence interval has decreased significantly through an improved estimator.

Notes on Implementation

As we have seen the jackknife is a general tool. In certain situations other tools exist. There are a couple of points we have to keep in mind when implementing these methods:

The selection of m: this is fairly arbitrary, if the statistic being analysed is very cumbersome to calculate then a smaller choice of m is helpful (or if we wish to run this method over very many statistics).
Properties of the statistic: in some instances we know the statistic must be bounded (e.g. a correlation coefficient must be between $[-1,1]$) This additional information can and should be used to improve the confidence interval. It often becomes more art than science when deciding how to present the results of this method.

Conclusion

In this post we have seen what a jackknife method is, why it works and a basic implementation. Hopefully now it is obvious the power these methods hold for reporting on the results of a Monte-Carlo simulation. In more sophisticated situations it can really give insight into which parts of a model suffer most from simulation error and also how confident we should be with an estimate it produces.