Note: Thoughts epressed within this workbook are my own and do not represent any prior, current nor future employers or affiliates

Background - An Overview of Modelling in Specialty Insurance

Within the specialty insurance space we are typically insuring economic interests of relatively rare events of high impact (for example: buildings damaged by hurricanes, aircraft crashes, impact of CEO wrong-doing, and so on.) These events are typically broken up into 2 broad classes:

  1. Property
  2. Casualty

Hence why the term "P&C" insurer is sometimes used. Property risks are, as the name suggests, related to property - historically phyiscal property but now can include non-physical property (e.g. data). Owing to the relative simplicity of these risks there is an entire universe of quantitative models that exist for risk management purposes, in particular there are a handful of vendors that create "natural catastrophe" (nat-cat) models. These models are sophisticated and all essentially rely on GIS style modelling: a portfolio of insured risks are placed on a geographical map (using lat-long co-ordinates) then "storm tracks" representing possible hurricane paths are run through the portfolio resulting in a statistical distribution of loss estimates. For other threats such as earthquakes, typhoons and wild-fires similar methods are used.

These nat-cat models allow for fairly detailed risk management procedures. For example it allows insurers to look for "hot spots" of exposure and can then allow for a reduction in exposure growth in these areas. They allow for counter-factual analysis: what would happen if the hurricane from last year took a slightly different track? It allows insurers to consider marginal impacts of certain portfolios, for example: what if we take on a portfolio a competitor is giving up, with our current portfolio will it aggregate or diversify? As a result of this explanatory power natural catastrophe risks are now well understood and for all intents and purposes these risks are now commodified and have allowed insurance linked securities (ILS) to form. </br>

Before this analytics boom specialty insurers made their money in natural catastrophe and property insurance, as such there has been a massive growth in recent years in the Casualty side of the business. Unfortunately the state of modelling on that side is, to put it politely, not quite at the same level.

As one would expect nat-cat model vendors have tried, and continue to try, to force the casualty business into their existing natural catastrophe models. This is a recipe for disaster as the network structure for something like the economy does not naturally lend itself to a geogprahic spatial representation. There is also a big problem of available data. Physical property risks give rise to data that is easy to cultivate. Casualty data is either hard to find or impossible - why would any corporation want to divulge all the details of their interactions? As such it does not appear that these approaches will become useful tools in this space.

To fill this void there has been an increasing movement of actuaries into casualty risk modelling roles. While this overcomes some of the problems that face the nat-cat models they also introduce a whole new set of issues. Traditional actuarial models relying on statistical curve fitting to macro-level data. Even assuming a suitable distribution function can be constructed it is of limited use for risk management as it only informs them of the "what" but not the "why", making it hard to orient a portfolio for a specific result. More recently actuaries have slowly began to model individual deals at a micro-level and aggregate them to get a portfolio view. To do this a "correlation matrix" is typically employed, this aproach also has issues:

  1. Methods don't scale well with size, adding new risks often require the entire model to be recalibrated taking time and effort.
  2. They either require a lot of parameters or unable to capture multi-factor dependency (e.g. a double trigger policy where each trigger has its own sources of accumulation).
  3. It is usually not possible to vary the nature of dependency (e.g. add tail dependence or non-central dependency)
  4. Results are often meaningless in the real world, it is usually impossible to perform counter-factual analysis

To bridge this gap I have developed a modelling framework that allows for the following:

  1. Modelling occurs at an individual insured interest level
  2. Modelling is scalable in the sense that adding new insured interests requires relatively few new parameters and calibrations
  3. Counter-factual analysis is possible and the model can be interpreted in terms of the real world
  4. The framework itself is highly parallelizable, whereas nat-cat models require teams of analysts, large servers and IT infrastructure this framework lends itself to being run by multiple people on regular desktop computers with little additional workflow requirements

A First Step: A Simple Driver Method

We will now look at a very stylised model of aggregation that will form a foundation on which we can build the more sophisticated model framework. We call this method of applying dependence a "driver method", it is standard practice for applying dependence in banking credit risk models where there can be many thousands of risks modelled within a portfolio. The interpretation is that there is a central "driver", each individual risk is "driven" by this and since this is common to all risks there is an induced dependence relation between them.

The model relies on the generalised inverse transform method of generating random variates. Stated very simply: if you apply the inverse CDF of a random variable to a random number (U[0,1] variate) you will have samples distributed as that random variable. Therefore in order to apply dependence in a general form we only need to apply dependence between U[0,1] variates. We will also exploit the fact that normal distributions are closed under addition (that is the sum of normals is normal).

We can now express the model as follows:

  1. We sample standard normal (N(0,1)) variates to represent the "driver" variable
  2. For each risk sample an additional set of normal variates
  3. Take a weighted sum of the "driver" and the additional normal variates to give a new (dependent) normal variate
  4. Standardise the result from step 3) and convert to a U[0,1] variable using the standard gaussian CDF
  5. Use an inverse transform to convert the result of step 4) to a variate as specified by the risk model

We can see that this method is completely general, it does not depend on any assumption about the stand-alone risk model distributions (it is a "copula" method). Another observation is that the normal variates here are in some sense "synthetic" and simply a tool for applying the dependence.

For clarity an example is presented below:

# Simple driver method example
# We model a central driver Z
# We want to model 2 risks: Y1 and Y2 which follow a gamma distribution
# Synthetic normal variates X1 and X2 are used to apply dependence

import numpy as np
from scipy.stats import gamma, norm
import matplotlib.pyplot as plt
%matplotlib inline

# Set number of simulations and random seed
SIMS = 1000
SEED = 123
np.random.seed(SEED)

# Simulate driver variables
Z = np.random.normal(0, 1, SIMS)

# Simulate temporary synthetic variable X1, X2 and standardise
X1 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Use normal CDF to convert X synthetic variables to uniforms U
U1 = norm.cdf(X1)
U2 = norm.cdf(X2)

# Use inverse transforms to create dependent samples of Y1 and Y2
Y1 = gamma.ppf(U1, 2)
Y2 = gamma.ppf(U2, 3)

# Plot a basic scatter to show dependence has been applied and calculate pearson coefficient
plt.scatter(Y1, Y2)
plt.xlabel('Y1')
plt.ylabel('Y2')
plt.show()

correl = np.corrcoef(Y1, Y2)
print("Estimated Pearson Correlation Coefficient:", correl[0,1])
Estimated Pearson Correlation Coefficient: 0.4628059800990357

The example above shows we have correlated gamma variates with around a 50% correlation coefficient (in this case we could calculate the correlation coefficient analytically but it is not necessary for our purposes, as we create more sophisticated models the analytic solutions become more difficult/impossible).

Even from this example we can see how models of this form provide superior scalability: for each additional variable we only need to specify 1 parameter: the weight given to the central driver. In contrast a "matrix" method requires each pair-wise combination to be specified (and then we require a procedure to convert the matrix to positive semi-definite form in order to apply it). Say our model requires something more sophisticated: say the sum of a correlated gamma and a weibull distribution - the number of parameters in a matrix representation grows very quickly. However it is worth noting we do lose some control, by reducing the number of parameters in this way we lose the ability to express every possible correlation network. However in most cases this is not a big problem as there is insufficient data to estimate the correlation matrix anyway.

It is worth pointing out that the type of dependency applied here is a "rank normal" dependency - this is the same dependency structure as in a multi-variate normal distribution, albeit generalised to any marginal distribution.

An Extension to the Simple Driver Method

We can extend the model above by noticing the following: there is nothing stopping the "synthetic" variables being considered drivers in their own right. Gaussians being closed under addition does not require that each variable needs to be independent, sums of rank correlated normals are still normal! We can thus extend the model to:

# Simple driver method example
# We model a central driver Z
# 2 additional drivers X1 and X2 are calculated off these
# We want to model 2 risks: Y1 and Y2 which follow a gamma distribution
# Synthetic normal variates sX1 and sX2 are used to apply dependence

import numpy as np
from scipy.stats import gamma, norm
import matplotlib.pyplot as plt
%matplotlib inline

# Set number of simulations and random seed
SIMS = 1000
SEED = 123
np.random.seed(SEED)

# Simulate driver variables
Z = np.random.normal(0, 1, SIMS)

# Simulate additional driver variables X1, X2 and standardise
X1 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Simulate Synthetic Variables sX and standardize
sX1 = (0.5 * X1 + 0.25 * X2 + 0.25 * np.random.normal(0, 1, SIMS))
sX1 = (sX1 - sX1.mean()) / sX1.std()
sX2 = (0.5 * X2 + 0.25 * X1 + 0.25 * np.random.normal(0, 1, SIMS))
sX2 = (sX2 - sX2.mean()) / sX2.std()

# Use normal CDF to convert sX synthetic variables to uniforms U
U1 = norm.cdf(sX1)
U2 = norm.cdf(sX2)

# Use inverse transforms to create dependent samples of Y1 and Y2
Y1 = gamma.ppf(U1, 2)
Y2 = gamma.ppf(U2, 3)

# Plot a basic scatter to show dependence has been applied and calculate pearson coefficient
plt.scatter(Y1, Y2)
plt.xlabel('Y1')
plt.ylabel('Y2')
plt.show()

correl = np.corrcoef(Y1, Y2)
print("Estimated Pearson Correlation Coefficient:", correl[0,1])
Estimated Pearson Correlation Coefficient: 0.7851999480298125

As before we have ended up with rank-normal correlated gamma variates. This time we have 3 potential "driver" variables Z, X1, X2 - all correlated with each other. It is not hard to see how this procedure can be iterated repeatedly to give arbitrarily many correlated driver variables. Further we can imagine these variables being oriented in a hierarchy, Z being at the bottom layer, X1 and X2 being a layer above, and so on.

What is a Driver?

We should now take a step back and think about the implications for the insurance aggregation problem. As stated previously this method allows us to define dependency with far fewer parameters than using a matrix approach. When you start getting into the realms of 100,000s of modelled variables this becomes increasingly important from a calibration perspective.

However there are other benefits: for example we can look at how the model variables relate to the driver variables. For example we can ask questions such as: "What is the distribution of modelled variables when driver Z is above the 75th percentile" and so on. This is a form of counter-factual analysis that can be performed using the model, with the matrix approaches you get no such ability. For counter-factual analysis to be useful however we require real-world interpretations of the drivers themselves. By limiting ourselves to counter-factual analysis based on driver percentiles (e.g. after the normal cdf is applied to Z, X1, X2 - leading to uniformly distributed driver variables) we make no assumption about the distribution about the driver itself, only its relationship with other drivers.

By not making a distributional assumption a driver can represent any stochastic process. This is an important but subtle point. For example we could create a driver for "global economy" (Z) and by taking weighted sums of these create new drivers "US economy" (X1) and "european economy" (X2). In this example there may be data driven calibrations for suitable weights to select (e.g. using GDP figures) however it is also relatively easy to use expert judgement. In my experience it is actually easier to elicit parameters in this style of model compared to "correlation" parameters given this natural interpretation.

Given this natural interpretation we can quite easily begin to answer questions such as: "What might happen to the insurance portfolio in the case of a european economic downturn?" and so on. Clearly the detail level of the driver structure controls what sort of questions can be answered.

As stated previously we can repeat the mechanics of creating drivers to create new "levels" of drivers (e.g. moving from "european economy" to "French economy", "UK economy" and so on). We can also create multiple "families" of driver, for example in addition to looking at economies we may consider a family relating to "political unrest", again this could be broken down into region then country and so on. Other driver families may not have a geographic interpretation - for example commodity prices. In some cases the families may be completely independent of each other, in other cases they can depend on each other (e.g. commodity prices will have some relationship with the economy).

In the examples so far we have presented a "top down" implementation in our examples: we start by modelling a global phenomena and then build "smaller" phenomena out of these. There is nothing special about this, we could have just as easily presented a "bottom up" implementation: take a number of "Z" variables to represent regions and combine these to form an "X" representing a global variable. Neither implementation is necessarily better than another and mathematically they lead to equivalent behaviours (through proper calibration). In practice however I have found the "top down" approach works better, typically you will start with a simple model and through time it can iterate and become more sophisticated. The top down approach makes it easier to create "backward compatability" which is a very useful feature for any modelling framework (e.g. suppose the first iteration of the framework only considers economic regions, next time a model is added which requires country splits - with top down adding new country variables keeps the economic regions identical without requiring any addtional thought.)

The need for more Sophistication: Tail Dependence

Unfortunately the model presented so far is still quite a way from being useful. We may have found a way of calibrating a joint distribution using relatively few (O(N)) parameters and can (in some sense) perform counter-factual analysis, but there is still a big issue.

So far the method only allows for rank-normal joint behaviour. From the analysis of complex systems we know that this is not necessarily a good assumption (please see other blog posts for details). We are particularly interested in "tail dependence", in layman's terms: "when things go bad, they go bad together". Tail dependence can arise for any number of reasons:

  • Structual changes in the system
  • Feedback
  • State space reduction
  • Multiplicative processes
  • Herd mentality/other human behaviours
  • And many others

Given the framework we are working within we are not particularly interested in how these effects occur, we are just interested in replicating the behaviour.

To do this we will extend the framework to cover a multivariate-student-T dependence structure. To do this we note the following: $$ T_{\nu} \sim \frac{Z} {\sqrt{ \frac{\chi^2_{\nu}} {\nu}}} $$ Where:
$ T_{\nu} $ follows a student-t distribution with $\nu$ degrees of freedom
$ Z $ follows a standard normal $N(0,1)$
$ \chi^2_{\nu} $ follows Chi-Square with $\nu$ degrees of freedom

Therefore we can easily extend the model to allow for tail dependence.

# Simple driver method example
# We model a central driver Z
# 2 additional drivers X1 and X2 are calculated off these
# We want to model 2 risks: Y1 and Y2 which follow a gamma distribution
# Synthetic normal variates sX1 and sX2 are used to apply dependence
# Tail dependence is added through Chi

import numpy as np
from scipy.stats import gamma, norm, chi2, t
import matplotlib.pyplot as plt
%matplotlib inline

# Set number of simulations and random seed
SIMS = 1000
SEED = 123
np.random.seed(SEED)

# Simulate driver variables
Z = np.random.normal(0, 1, SIMS)

# Simulate additional driver variables X1, X2 and standardise
X1 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Simulate Synthetic Variables sX and standardize
sX1 = (0.5 * X1 + 0.25 * X2 + 0.25 * np.random.normal(0, 1, SIMS))
sX1 = (sX1 - sX1.mean()) / sX1.std()
sX2 = (0.5 * X2 + 0.25 * X1 + 0.25 * np.random.normal(0, 1, SIMS))
sX2 = (sX2 - sX2.mean()) / sX2.std()

# Simulate Chi-Square for tail-dependence
nu = 3
Chi = chi2.rvs(nu, size=SIMS)
sX1 /= np.sqrt(Chi / nu)
sX2 /= np.sqrt(Chi / nu)

# Use t CDF to convert sX synthetic variables to uniforms U
U1 = t.cdf(sX1, df=nu)
U2 = t.cdf(sX2, df=nu)

# Use inverse transforms to create dependent samples of Y1 and Y2
Y1 = gamma.ppf(U1, 2)
Y2 = gamma.ppf(U2, 3)

# Plot a basic scatter to show dependence has been applied and calculate pearson coefficient
plt.scatter(Y1, Y2)
plt.xlabel('Y1')
plt.ylabel('Y2')
plt.show()

correl = np.corrcoef(Y1, Y2)
print("Estimated Pearson Correlation Coefficient:", correl[0,1])
Estimated Pearson Correlation Coefficient: 0.7907911109201866

Adding Flexibility

We can further extend this model by allowing each model variate to have its own tail-dependence. Why is this important one might ask? In the case of this framework we are spanning many different models, selecting a single degree of tail dependence might not be suitable for all variables. We can do this via applying another inverse transform: $$ T_{\nu} \sim \frac{Z} {\sqrt{ \frac{F^{-1}_{\chi^2_{\nu}}(U)} {\nu}}} $$ As before but where:
$U$ follows a uniform U[0,1] distribution
$F^{-1}_{\chi^2_{\nu}}$ is the inverse cdf of $\chi^2_{\nu}$

# Simple driver method example
# We model a central driver Z
# 2 additional drivers X1 and X2 are calculated off these
# We want to model 2 risks: Y1 and Y2 which follow a gamma distribution
# Synthetic normal variates sX1 and sX2 are used to apply dependence
# Tail dependence is added through Chi1 and Ch2 with varying degrees

import numpy as np
from scipy.stats import gamma, norm, chi2, t
import matplotlib.pyplot as plt
%matplotlib inline

# Set number of simulations and random seed
SIMS = 1000
SEED = 123
np.random.seed(SEED)

# Simulate driver variables
Z = np.random.normal(0, 1, SIMS)

# Simulate additional driver variables X1, X2 and standardise
X1 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Simulate Synthetic Variables sX and standardize
sX1 = (0.5 * X1 + 0.25 * X2 + 0.25 * np.random.normal(0, 1, SIMS))
sX1 = (sX1 - sX1.mean()) / sX1.std()
sX2 = (0.5 * X2 + 0.25 * X1 + 0.25 * np.random.normal(0, 1, SIMS))
sX2 = (sX2 - sX2.mean()) / sX2.std()

# Simulate Chi-Square for tail-dependence
nu1 = 2
nu2 = 4
U = np.random.rand(SIMS)
Chi1 = chi2.ppf(U,df=nu1)
Chi2 = chi2.ppf(U, df=nu2)
sX1 /= np.sqrt(Chi1 / nu1)
sX2 /= np.sqrt(Chi2 / nu2)

# Use t CDF to convert sX synthetic variables to uniforms U
U1 = t.cdf(sX1, df=nu1)
U2 = t.cdf(sX2, df=nu2)

# Use inverse transforms to create dependent samples of Y1 and Y2
Y1 = gamma.ppf(U1, 2)
Y2 = gamma.ppf(U2, 3)

# Plot a basic scatter to show dependence has been applied and calculate pearson coefficient
plt.scatter(Y1, Y2)
plt.xlabel('Y1')
plt.ylabel('Y2')
plt.show()

correl = np.corrcoef(Y1, Y2)
print("Estimated Pearson Correlation Coefficient:", correl[0,1])
Estimated Pearson Correlation Coefficient: 0.7703228652641819

There is a small practical issue relating to multivariate student-t distributions: namely that we lose the ability to assume independence. This is a direct result of allowing for tail dependence. In many situations this is not an issue, however within this framework we have models covering very disperate processes some of which may genuinely exhibit independence. To illustrate this issue we will re-run the existing model with zero driver weights ("attempt to model independence"):

Estimated Pearson Correlation Coefficient: 0.08220534833363176

As we can see there is a dependence between Y1 and Y2 0 clearly through the chi-square variates. We can overcome this issue by "copying" the driver process. The common uniform distribution is then replaced a number of correlated uniform distributions. We can then allow for independence. An implemntation of this can be seen in the code sample below:

# Simple driver method example
# We model a central driver Z
# 2 additional drivers X1 and X2 are calculated off these
# We want to model 2 risks: Y1 and Y2 which follow a gamma distribution
# Synthetic normal variates sX1 and sX2 are used to apply dependence
# Tail dependence is added through Chi1 and Ch2 with varying degrees
# Chi1 and Chi2 are driven by X1tail and X2tail which are copies of X1 and X2 drivers

import numpy as np
from scipy.stats import gamma, norm, chi2, t
import matplotlib.pyplot as plt
%matplotlib inline

# Set number of simulations and random seed
SIMS = 1000
SEED = 123
np.random.seed(SEED)

# Simulate driver variables
Z = np.random.normal(0, 1, SIMS)

# Simulate copy of driver for tail process
Ztail = np.random.normal(0, 1, SIMS)

# Simulate additional driver variables X1, X2 and standardise
X1 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Simulate additional tail drivers
X1tail = (0.5 * Ztail + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2tail = (0.5 * Ztail + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Simulate Synthetic Variables sX and standardize
sX1 = (0.5 * X1 + 0.25 * X2 + 0.25 * np.random.normal(0, 1, SIMS))
sX1 = (sX1 - sX1.mean()) / sX1.std()
sX2 = (0.5 * X2 + 0.25 * X1 + 0.25 * np.random.normal(0, 1, SIMS))
sX2 = (sX2 - sX2.mean()) / sX2.std()

# Simulate Synthetic Variables for tail process
sX1tail = (0.5 * X1tail + 0.25 * X2tail + 0.25 * np.random.normal(0, 1, SIMS))
sX1tail = (sX1tail - sX1tail.mean()) / sX1tail.std()
sX2tail = (0.5 * X2tail + 0.25 * X1tail + 0.25 * np.random.normal(0, 1, SIMS))
sX2tail = (sX2tail - sX2tail.mean()) / sX2tail.std()

# Simulate Chi-Square for tail-dependence
nu1 = 2
nu2 = 4
Chi1 = chi2.ppf(norm.cdf(sX1tail),df=nu1)
Chi2 = chi2.ppf(norm.cdf(sX2tail), df=nu2)
sX1 /= np.sqrt(Chi1 / nu1)
sX2 /= np.sqrt(Chi2 / nu2)

# Use t CDF to convert sX synthetic variables to uniforms U
U1 = t.cdf(sX1, df=nu1)
U2 = t.cdf(sX2, df=nu2)

# Use inverse transforms to create dependent samples of Y1 and Y2
Y1 = gamma.ppf(U1, 2)
Y2 = gamma.ppf(U2, 3)

# Plot a basic scatter to show dependence has been applied and calculate pearson coefficient
plt.scatter(Y1, Y2)
plt.xlabel('Y1')
plt.ylabel('Y2')
plt.show()

correl = np.corrcoef(Y1, Y2)
print("Estimated Pearson Correlation Coefficient:", correl[0,1])
Estimated Pearson Correlation Coefficient: 0.7406745557389065

To show this allows full independence we repeat the zero-weight example:

Estimated Pearson Correlation Coefficient: -0.01456173215652803

We can see that this is a much better scatter plot if we are looking for independence!

Non-Centrality

We now extend this model yet further. So far we have allowed for tail dependence however it treats both tails equally. In some instances this can be problematic. For example if we rely on output from the framework to do any kind of risk-reward comparison the upisde and downside behaviour are both important. While it is easy to think of structural changes leading to a downside tail dependence an upside tail dependence is typically harder to justify. We can allow for this with a simple change to the model, namely: $$ T_{\nu, \mu} \sim \frac{Z + \mu} {\sqrt{ \frac{F^{-1}_{\chi^2_{\nu}}(U)} {\nu}}} $$ The addition of the $\mu$ parameter means that $T_{\nu, \mu}$ follows non-central student-t distribution with $\nu$ degrees of freedom and non-centrality $\mu$. Details of this distribution can be found on wikipedia. By selecting large positive values of $\mu$ we can create tail dependence in the higher percentiles, large negative values can create tail dependence in the lower percentiles and a zero value leads to a symmetrical dependency. Adjusting the code futher we get:

# Simple driver method example
# We model a central driver Z
# 2 additional drivers X1 and X2 are calculated off these
# We want to model 2 risks: Y1 and Y2 which follow a gamma distribution
# Synthetic normal variates sX1 and sX2 are used to apply dependence
# Tail dependence is added through Chi1 and Ch2 with varying degrees
# Chi1 and Chi2 are driven by X1tail and X2tail which are copies of X1 and X2 drivers
# We add non-centrality through an additive scalar

import numpy as np
from scipy.stats import gamma, norm, chi2, nct
import matplotlib.pyplot as plt
%matplotlib inline

# Set number of simulations and random seed
SIMS = 1000
SEED = 123
np.random.seed(SEED)

# Simulate driver variables
Z = np.random.normal(0, 1, SIMS)

# Simulate copy of driver for tail process
Ztail = np.random.normal(0, 1, SIMS)

# Simulate additional driver variables X1, X2 and standardise
X1 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Simulate additional tail drivers
X1tail = (0.5 * Ztail + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
X2tail = (0.5 * Ztail + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)

# Simulate Synthetic Variables sX and standardize
sX1 = (0.5 * X1 + 0.25 * X2 + 0.25 * np.random.normal(0, 1, SIMS))
sX1 = (sX1 - sX1.mean()) / sX1.std()
sX2 = (0.5 * X2 + 0.25 * X1 + 0.25 * np.random.normal(0, 1, SIMS))
sX2 = (sX2 - sX2.mean()) / sX2.std()

# Simulate Synthetic Variables for tail process
sX1tail = (0.5 * X1tail + 0.25 * X2tail + 0.25 * np.random.normal(0, 1, SIMS))
sX1tail = (sX1tail - sX1tail.mean()) / sX1tail.std()
sX2tail = (0.5 * X2tail + 0.25 * X1tail + 0.25 * np.random.normal(0, 1, SIMS))
sX2tail = (sX2tail - sX2tail.mean()) / sX2tail.std()

# Simulate Chi-Square for tail-dependence
nu1 = 2
nu2 = 4
Chi1 = chi2.ppf(norm.cdf(sX1tail),df=nu1)
Chi2 = chi2.ppf(norm.cdf(sX2tail), df=nu2)
sX1 /= np.sqrt(Chi1 / nu1)
sX2 /= np.sqrt(Chi2 / nu2)

# Specify the non-centrality values
nc1 = -2
nc2 = -2

# Use non-central t CDF to convert sX synthetic variables to uniforms U
U1 = nct.cdf(sX1+nc1, nc=nc1, df=nu1)
U2 = nct.cdf(sX2+nc2, nc=nc2, df=nu2)

# Use inverse transforms to create dependent samples of Y1 and Y2
Y1 = gamma.ppf(U1, 2)
Y2 = gamma.ppf(U2, 3)

# Plot a basic scatter to show dependence has been applied and calculate pearson coefficient
plt.scatter(Y1, Y2)
plt.xlabel('Y1')
plt.ylabel('Y2')
plt.show()

correl = np.corrcoef(Y1, Y2)
print("Estimated Pearson Correlation Coefficient:", correl[0,1])
Estimated Pearson Correlation Coefficient: 0.7100911602838634

In the code example we have selected a non-centrality of -2 which is a fairly large negative value, we can see the dependency increasing in the lower percentiles (clustering around (0,0) on the plot).

Temporal Considerations

So far we have essentially considered a "static" model, we have modelled a number of drivers which represent values at a specific time period. For the majority of insurance contracts this is sufficient: we are only interested in losses occuring over the time period the contract is active. However in some instances the contracts relate to multiple time periods and it does not make sense to consider losses over the entire lifetime. Moreover it is not ideal to model time periods as independent from one another, to take the US economy example: if in 2020 the US enters recession it is (arguably) more likely that the US will also stay in recession in 2021. Clearly the dynamics of this are very complex and constructing a detailed temporal model is very difficult, however for the sake of creating the drivers we do not need to know the exact workings. Instead we are looking for a simple implementation that gives dynamics that are somewhat justifiable.

Fortunately it is relatively easy to add this functionality to the model framework we have described so far. Essentially we will adopt a Markovian assumption whereby a driver in time period t+1 is a weighted sum of its value at time t and an idiosyncratic component. Of course this is not a perfect description of the temporal behaviour of every possible driver but it shouldn't be completely unjustifiable in most instances and the trajectories shouldn't appear to be totally alien (e.g. US economy being in the top 1% one year immediately followed by a bottom 1% performance very frequently).

To illustrate this please see the code example below, for brevity I will change the model code above to a functional definition to avoid repeating blocks of code.

# Creating temporally dependent variables

import numpy as np
from scipy.stats import gamma, norm, chi2, nct
import matplotlib.pyplot as plt
%matplotlib inline

# Set number of simulations and random seed
SIMS = 1000
SEED = 123
np.random.seed(SEED)

# Define function to create correlated normal distributions
def corr_driver():
    # Create driver Z
    Z = np.random.normal(0, 1, SIMS)
    
    # Create drivers X1, X2
    X1 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
    X2 = (0.5 * Z + 0.5 * np.random.normal(0, 1, SIMS)) / np.sqrt(0.5**2 + 0.5**2)
    
    return np.array([X1, X2])

# Create drivers variables for time periods t0 and t1
driver_t0 = corr_driver()
driver_t1 = 0.5 * driver_t0 + 0.5 * corr_driver() / np.sqrt(0.5**2 + 0.5**2)

# Create copy of drivers for tail process time periods t0 and t1
tail_t0 = corr_driver()
tail_t1 = 0.5 * tail_t0 + 0.5 * corr_driver() / np.sqrt(0.5**2 + 0.5**2)

# Define a standardise function
def standardise(x):
    return (x - x.mean()) / x.std()

# Create sythetic variables sX1 sX2 for variable 1 and 2 at times t0 and t1
# Note depending on the model idiosyncratic components may also be dependent
sX1t0 = standardise(0.25*driver_t0[0] + 0.5*driver_t0[1] + 0.25*np.random.normal(0, 1, SIMS))
sX1t1 = standardise(0.25*driver_t1[0] + 0.5*driver_t1[1] + 0.25*np.random.normal(0, 1, SIMS))
sX2t0 = standardise(0.5*driver_t0[0] + 0.25*driver_t0[1] + 0.25*np.random.normal(0, 1, SIMS))
sX2t1 = standardise(0.5*driver_t1[0] + 0.25*driver_t1[1] + 0.25*np.random.normal(0, 1, SIMS))

# Repeat synthetic variable construction for tail process
sX1tailt0 = standardise(0.25*tail_t0[0] + 0.5*tail_t0[1] + 0.25*np.random.normal(0, 1, SIMS))
sX1tailt1 = standardise(0.25*tail_t1[0] + 0.5*tail_t1[1] + 0.25*np.random.normal(0, 1, SIMS))
sX2tailt0 = standardise(0.5*tail_t0[0] + 0.25*tail_t0[1] + 0.25*np.random.normal(0, 1, SIMS))
sX2tailt1 = standardise(0.5*tail_t1[0] + 0.25*tail_t1[1] + 0.25*np.random.normal(0, 1, SIMS))

# Simulate Chi-Square for tail-dependence t0 and t1
nu1 = 2
nu2 = 4

Chi1t0 = chi2.ppf(norm.cdf(sX1tailt0),df=nu1)
Chi2t0 = chi2.ppf(norm.cdf(sX2tailt0), df=nu2)
sX1t0 /= np.sqrt(Chi1t0 / nu1)
sX2t0 /= np.sqrt(Chi2t0 / nu2)

Chi1t1 = chi2.ppf(norm.cdf(sX1tailt1),df=nu1)
Chi2t1 = chi2.ppf(norm.cdf(sX2tailt1), df=nu2)
sX1t1 /= np.sqrt(Chi1t1 / nu1)
sX2t1 /= np.sqrt(Chi2t1 / nu2)

# Specify the non-centrality values
nc1 = 2
nc2 = 2

# Use non-central t CDF to convert sX synthetic variables to uniforms U for t0 and t1
U1t0 = nct.cdf(sX1t0+nc1, nc=nc1, df=nu1)
U2t0 = nct.cdf(sX2t0+nc2, nc=nc2, df=nu2)
U1t1 = nct.cdf(sX1t1+nc1, nc=nc1, df=nu1)
U2t1 = nct.cdf(sX2t1+nc1, nc=nc2, df=nu2)

# Use inverse transforms to create dependent samples of Y1 and Y2 at t0 and t1
Y1t0 = gamma.ppf(U1t0, 2)
Y2t0 = gamma.ppf(U2t0, 3)
Y1t1 = gamma.ppf(U1t1, 2)
Y2t1 = gamma.ppf(U2t1, 3)

# Plot a basic scatter to show dependence has been applied and calculate pearson coefficient
plt.scatter(Y1t0, Y1t1)
plt.xlabel('Y1(t=t0)')
plt.ylabel('Y1(t=t1)')
plt.show()

correl = np.corrcoef(Y1t0, Y1t1)
print("Estimated Pearson Auto-Correlation Coefficient:", correl[0,1])
Estimated Pearson Auto-Correlation Coefficient: 0.37600307233845764

In this code example we created to variables Y1 and Y2, each one taking a value from a Gamma distribution at times t0 and t1. Y1 and Y2 have a dependency between eachother but also temporally.

As with any temporal model the time period chosen is very important, typically for insurance contracts yearly time periods make sense. However in one particular model I developed there was a need for monthly simulations, rather than re-parameterising the entire central driver structure to work on a monthly basis (creating lots of extra data that will not be used by the vast majority of the models) I applied a "Brownian Bridge" type argument to interpolate driver simulations for each month.

Notes on Implementation

In this blog post I have not included the code exactly as it is implemented in production since this is my employer's IP. The implementation presented here is not very efficient and trying to run large portfolios in this way will be troublesome. In the full production implementation I used the following:

  1. Strict memory management as the this is a memory hungry program
  2. Certain aspects of the implementation are slow in pure python (and even Numpy) Cython and Numba are used for performance
  3. The Scipy stats module is convenient but restrictive, it is better to either use the Cython address for Scipy special functions or implement functions from scratch. By implementing extended forms of some of the distribution functions one is also able to allow for non-integer degrees of freedom which is useful
  4. The model naturally lends itself to arrays (vectors, matrices, tensors) however these tend to be sparse in nature, it is often better to construct "sparse multiply" type operations rather than inbuilt functions like np.dot

Conclusion

This blog posts represents the current iteration of the aggregation framework I have developed. It is considered a "version 0.1" implementation and is expected to develop as we use it more extensively and uncover further properties or issues. For example it is clear regardless of parameters selected the joint behaviour will always be (approximately) elliptical, as presented it is not possible to implement non-linearities (e.g. the price of some asset will only attain a maximum/minimum value dependent on some other driver indicator). It is not difficult to implement ideas like this when the need arises, the difficulty becomes more around how to implement the idea in a seamless way.

There are a couple of additional benefits to this framework which we have not mentioned, I will outline these here briefly:

  1. It is possible to parallelise this process quite effectively as there are minimal bottlenecks/race conditions
  2. The driver variables can be generated centrally and models can link to this central variable repository. From a work-flow perspective this means that individual underwriting teams can run models independently (quickly) leaving the risk teams to collate an analyse the results. (Sometimes called a federated workflow.)
  3. The federated workflow means no specialist hardware is required, even very large portfolios can be run on standard desktops/laptops.

The current production version of this framework has around 5-10,000 driver variables ("X1, X2") over 5 different hierarchical layers. These influence dependence between around 500,000 individual modelled variables ("Y1, Y2") with 20 time periods ("t0, t1"). The quality of risk management analysis and reporting has increased dramatically as a result.

There are still some things left to do in relation to this framework and the work is on-going. These include:

  1. Work relating to calibration and how to do this as efficiently as possible
  2. Further work on increasing code efficiency
  3. Further mathematical study of the framework's parameters
  4. Study of the implied network behaviour: since we're placing risks on a network (driver structure) can we gain additional insight by considering contagion, critical nodes, etc.?
  5. Further improvements to the workflow, how the model data is stored/collated etc.