This is the third blog post in a series - you can find the previous blog post here


In this blog post we continue to look at the problem of sampling from multi-variate distributions. Recall from the previous blog post that a copula $C(.)$ is a multivariate function:

$$ C(U_1, \, ... \, , U_N) = \mathbb{P}(X_1 \leq F_1(U_1), \, ... \, , X_N \leq F_N(U_N)) $$

We note that this defines a multivariate CDF. We can find the corresponding density function (PDF) $c(.)$ in the usual way:

$$ c(u_1, \, ... \, , u_N) = \frac{\partial^N}{\partial u_1 \, ... \, \partial u_N} C(u_1, \, ... \, , u_N) $$

Using this representation of the copula we can denote the joint distribution density $f(.)$ via a factorization of the copula density above and the marginal densities $f_i(.)$ (with corresponding CDF $F_i(x_i)$) via:

$$ \underbrace{f(x_1, \, ... \, x_N)}_\text{Joint density} = \underbrace{f_1(x_1) \, ... \, f_N(x_N)}_\text{Marginal density} \, \, \, \underbrace{c(F_1(x_1), \, ... \, ,F_N(x_N))}_\text{Copula density}$$

This highlights the power of the copula method showing that in modelling we are able to separate the marginal and joint behaviours into independent parts.

Problems with the Copula Method

Things are not as rosy as they seem however. We can use data to inform the marginal density functions, this is well established and we have a wide number of techniques at our disposal. However the copula density remains problematic - in anymore than the bi-variate case we are unlikely to ever have enough data inform model selection and calibration. We thus have to rely quite heavily on "trial and error" to find the correct structure to use and furthermore calibrate the parameters of that structure to give the desired result.

We also have issues in higher dimension copula based modelling. Typically an archimedean copula will not be suitable in high dimensions. For an archimedean copula to be used it typically the generator needs to be well understood mathematically which results in only a handful being widely used. Of those used typically there will only be a handful of parameters and so they are not always flexible. For example the usual implemenation of the multi-variate Gumbel copula (used for modelling upper-tail dependence in high dimension) has a single parameter controlling the degree of tail dependence (and overall dependency) - this is not very useful where we think some subset of variables will be more strongly tied than others.

As a result high-dimension multi-variate modelling typically relies on the joint-elliptical methods (typically the Gaussian/rank-normal or the student-t). In their standard forms these themselves have a number of issues:

  • We need to specify an entire depedency matrix. This doesn't scale well with size and often adding a few new variables requires us to completely re-calibrate the matrix from scratch in order to ensure sensible behaviour, positive definite-ness and so on
  • The Gaussian shows no tail dependence and the standard student-t copula allows only for 1 parameter controlling the tail dependence of all variables
  • The tail dependence on the student-t is symmetrical which is not always desired
  • With the student-t copula we cannot enforce independence

To overcome this we can apply various techniques, generalizations and extensions to (at least partially) overcome the issues above (e.g. see my insurance aggregation model). However these methods still lack the flexibility to model certain relations. For example suppose we have 3 variables $(X, Y, Z)$ we want to apply some dependence structure such that $X$ and $Y$ have a tie and $X$ and $Z$ have a tie - but given $Z$ then $X$ and $Y$ are independent. This can arise in various scenarios and yet it is very difficult to capture this through a joint-elliptical copula. We therefore require a more powerful method.

Vine Copula

This is where the vine copula comes in. As with many computational statistical methods we want to produce a general tool that will work in a wide variety of situations. In the case of multi-variate sampling we want something that scales nicely with dimension. In order to do this we will look ot specify the dependency structure using a probabilistic graphical model (PGM).

We will not get into all the details surrounding PGMs since that is a field of study all to itself. Instead we will just note that a graph such as that seen below:

Can be used to specify a joint density by factorising over cliques in the undirected graph:

$$ f(X_A, X_B, X_C, X_D, X_E, X_F) = f_{ABC}(X_A, X_B, X_C) f_{CD}(X_C, X_D) f_{DEF}(X_D, X_E, X_F) $$

We can note that this has more than a passing resemblence to our joint distribution decomposition using copulae above. The main crux of the vine copula is finding a decomposition of a copula using a graphical model, we can then specify individual copulae on each clique (e.g. in the example above clique $(A,B,C)$ could be defined with a rank-normal while clique $(C,D)$ via a gumbel, and so on). Of course this offers us tremendous flexibility, to the point of leading to option paralysis! Certain simplifications have been specified to aid this.

In vine copulae the graphical model used is, as the name suggests, a vine (or cantor tree). By enforcing this requirement we bypass the need for worrying about loops and all the other technical gubbins that make working with graphical models generally quite difficult.

Regular Vines

We start by defining vines in a more precise way. A vine $\mathcal{V}$ defined on $N$ variables is a nested set of connected trees: $\mathcal{V} = \{T_1 \, , \, ... \, , \, T_{N-1} \}$. The nodes of $T_{j+1}$ being the edges of $T_{j}$. A regular vine is a vine where two edges in $T_j$ are joined by an edge in $T_{j+1}$ only if the two edges share a common node. It is much easier to work with regular vines than non-regular vines. We can define this more formally:


Definition: Regular Vine

$\mathcal{V}$ is a regular vine on $N$ elements with edge set: $E(\mathcal{V}) = E_1 \cup E_2 \cup \, ... \, \cup E_{N-1}$ if:

  1. $\mathcal{V} = \{T_1 \, , \, ... \, , \, T_{N-1} \}$
  2. $T_1$ is a connected tree with nodes $N_1 = \{1 \, , \, ... \, , \, N-1 \}$ and edges $E_1$. For $i=2, ... N-1$ : $T_i$ is a tree with nodes $N_i = E_{i-1}$
  3. For $i=2, ... N-1$ : for $\{a,b\} \in E_i$ we have: $|a \triangle b| = 2$ for $\triangle$ the symmetric difference operator

We can further delineate regular vines into C-vines (canonical vines) where for each tree $T_i$ there is a unique node with $n-i$ connecting edges (degree) - this is the maximum degree for any node at that level. A regular vine is called a D-vine if all nodes in $T_1$ have a degree no higher than 2. Of course we can also have a vine that is not a C-vine nor a D-vine while still satisfies the definition above - we call these R-vines. In some sense an R-vine is "inbetween" a C-vine and a D-vine - the C-vine and D-vine being "boundary conditions" in some sense.

It is often easier to see examples than battle with formal definitions (especially with PGMs) so lets look at an example in each class:

Alternatively we can look at the sequence of trees separately:

If we expressed this vine in the form from before it would look like:

We can also make some further definitions:


Defintion: Constraint Set, Conditioning Set, Conditioned Set

  1. For $e \in E_i$ ($i \leq N-1$) the constraint set associated with $e$ is the complete union $U_e^*$ - the subset of $\{1, \, ... \, , N\}$ reachable from $e$ by the membership relation
  2. For $i \leq N-1$ and $e \in E_i$ : if $e=\{j, k\}$ then the conditioning set associated with $e$ is: $$ D_e = U_j^* \cap U_k^*$$ The conditioned set associated with $e$ is: $$ \{ C_{e,j} , C_{e,k} \} = \{ U_j^* \backslash D_e, U_k^* \backslash D_e \}$$

In the example above the edge XXX has conditioning set XXX and conditioned set XXX - hence the choice of naming convention!

For a regular vine $\mathcal{V} = \{T_1 \, , \, ... \, , \, T_{N-1} \}$ we have the following properties:

  1. The number of edges is $N(N-1)/2$
  2. Each pair of variables occurs exactly once as a conditioned set
  3. If two edges have the same conditioning set then they are the same edge

The scaling of regular vines works as so:

  1. For any regular vine on $(N-1)$ elements there are $2^{N-3}$ choices of $N$-dimensional regular vine that extend it
  2. There are ${N \choose 2} (N-2)!2^{(N-2)(N-3)/2}$ labelled regular vines

We can thus see that model selection for regular vine copula methods can quickly become unweildly. We often need to rely on sophisticated methods to do this (we will not discuss these here). We just note that we can set up "equivalence classes" of regular vines where two regular vines are in the same class if there is a permutation of the conditional distribution variables that maps one to the other. For example if $N=3$ then for a D-vine we have conditional distributions for: $\{ (1,2), (2,3), (1,3)|2 \}$ and for a C-vine we have: $\{ (1,2), (1,3), (2,3)|1 \}$ by taking permutation: $\pi(1,2,3)=(2,1,3)$ we have equivalence - that is C-vines and D-vines are equivalent for $N=3$ (of course this is not true in general).

The real power of the regular vine copula comes through the following theorem from Bedford and Cooke (shown without proof).


Theorem:

Let $\mathcal{V} = \{T_1 \, , \, ... \, , \, T_{N-1} \}$ be a regular vine over $N$ elements. For any edge $e = \{e_1, e_2\} \in E(\mathcal{V})$ with conditioning set $D_e$, let a bivariate conditional copula be denoted: $C_{e_1,e_2 | D_e}$ with density: $c_{e_1,e_2 | D_e}$. Let marginal distributions be denoted as $F_i$ with corresponding densities $f_i$. Then the vine-dependent multivariate distribution is uniquely determined and has density: $$f_{X_1...X_N}(x_1,...x_N) = \left( \prod_{i=1}^N f_i(x_i) \right) \left( \prod_{e \in E(\mathcal{V})} c_{e_1,e_2 | D_e} ( F_{e_1 | D_e}(x_{e_1}) , F_{e_2| D_e}(x_{e_2})) \right) $$


Just as Sklar defined a bijection between multi-variate uniform distributions and copulae, this theorem does the equivalent for regular vine copulae. From before we know that we have many options for bivariate relations (for example we looked at the Gaussian, student-t, Frank, Clayton and Gumbel copulae in the previous blog post) - whereas multi-variate relations are harder to specify (with typically only Gaussian and student-t used in practice). The regular vine copula method allows us more flexibility in modelling multi-variate distributions. Moreover we can make use of conditioning arguments to help specify fairly specialised relations.

Through construction we can notice the following properties:

  • If each bivariate copula specified is of Gaussian type then resulting multi-variate copula is also a Gaussian. This also has the property that conditional correlation does not depend on the conditioning varibles. We find similar results for the student-t
  • If edge $e = \{e_1, e_2 \} \in E_l$ with $l > 1$. Then if copula $C_e$ is more concordant than $C_e'$ then the margin $F_e$ is more concordant than the margin $F_e'$
  • Of $C_e$ has upper/lower tail ependence for all $e \in E_1$ and the remaining copulae have support in $[0,1]^2$ then all bivariate margins of the joint distribution have upper/lower tail dependence
  • For parametric copulae $C_e(.; \theta_e)$ then each marginal copulae exists within the Frechet upper/lower bounds (see previous blog post for details)

These points are a rather fancy way of saying that through the regular-vine construction we get a multi-variate distribution that behaves how we would like/expect.

We also now take a brief diversion into the definition of a partial correlation.


Definition: Partial Correlation

Let $X_i$ be a variable with zero mean and unit variance. For $i=1,...,N$ define numbers: $b_{ij:\{1,...,,N\} \backslash \{i,j\}}$ that minimise:

$$ \mathbb{E} \left[ \left( X_i - \sum_{j \neq i} b_{ij:\{1,...,,N\} \backslash \{i,j\}} X_j \right)^2 \right]$$

We then define the parital correlation coefficient: $\rho_{ij:\{1,...,,N\} \backslash \{i,j\}}$ as:

$$ \rho_{ij:\{1,...,,N\} \backslash \{i,j\}} = \operatorname{sgn}\left(b_{ij:\{1,...,,N\} \backslash \{i,j\}}\right)\left(b_{ij:\{1,...,,N\} \backslash \{i,j\}} b_{j,i:\{1,...,,N\} \backslash \{i,j\}}\right)^{1/2} $$

We can think of this as the correlation between the projections of $X_i$ and $X_j$ onto a plane orthogonal to the space spanned by all other variables. We can equivalently write this as:

$$ \rho_{ij:\{1,...,,N\} \backslash \{i,j\}} = - \frac{K_{ij}}{\sqrt{K_{ii}K_{jj}}} $$

Where $K_{ij}$ is the $(i,j)$th co-factor of the correlation matrix.


There is nothing special about this definition that is unique to the vine copula methods we are describing. We could equally define this for any Gaussian or student-t or other elliptical multivariate distribution. For those a little rusty on their linear algebra the $(i,j)$th co-factor of a matrix is equal to the determinant of the matrix after removing the $i$th row and $j$th column multiplied by the factor: $(-1)^{i+j}$.

We mention this here since we have the following theorem:


Theorem:

The regular vine provides a bijective mapping from $(-1,1)^{N \choose 2}$ to the set of positive definite matrices with $1$ diagonals (i.e. valid joint-elliptical correlation matrices)


This means that we can use regular vines as a parameterization tool for joint elliptical (e.g. Gaussian) multi-variate distributions. We do this by specifying partial correlations on each edge in $E(\mathcal{V})$.

Dependency Between Discrete Variables

So far we have not differentiated between sampling continuous and discrete variables (e.g. we generate our copula and then use a generalize inverse transform to return dependent variables of arbitrary distribution be they Gaussian or Poisson). Unfortunately things are not quite that simple if we are targetting a specific level of Pearson correlation (note that previously we showed why Pearson correlation is an imperfect measure, but in practical situations it is sometimes the only metric we have). Let's use the following definitions of variables:

  1. $(Z_1,...,Z_N)$ are sample from a multivariate distribution with marginal distribution functions $G_i(.)$
  2. $(U_1,...,U_N) = (G_1(Z_1),...,G_N(Z_N))$ are rank dependent uniforms corresponding to $Z_i$
  3. $(X_1,...,X_N) = (F^{-1}_1(U_1),...,F^{-1}_2(U_2))$ are our discrete variable transforms

We then have that in general:

$$Corr_{Pearson}(Z_i, Z_j) \neq Corr_{Pearson}(U_i, U_j) \neq Corr_{Pearson}(X_i, X_j)$$

This is not always rue for continuous variables e.g. when a Gaussian multi-variate and suitably chosen $F(.)$. To overcome this for discrete variables we often have to adjust our input parameters and sensitivity test until a desired output correlation is reached. In the case of large $N$ this quickly becomes prohibitive and we simply have to accept the biases our modelling procedure has, in some cases this can lead to large mis-statements in the model and thus we can draw incorrect conclusions.

It has been shown (in the Gaussian dependency, Poisson/eneralized Poisson/negative-binomial marginal cases) that regular vines have lower biases and do a better job at representing the input calibration. This is especially useful in cases of high dispersion where the typical approaches can lead to large inaccuracies.

Sampling from Regular Vines

Sampling the D-Vine

We will now move onto the problem of sampling from the D-Vine, we start with the D-Vine since it's structure makes it easier to sample from and provides a good foundation and further explanation of what we have seen so far.

Inverse Transform Method

Let $(u_1, u_2, u_3, u_4)$ be 4 iid samples from a uniform distribution. We use the notaion: $F_{i | j:x_j}$ to be the cumulative distribution function of $X_j$ given that $X_i = i$. Then we can produce a sample $(x_1, x_2, x_3, x_4)$ from the joint distribution $(X_1, X_2, X_3, X_4)$ specified by our D-Vine as:

\begin{align} x_1 &= u_1 \\ x_2 &= F^{-1}_{2|1:x_1}(u_2) \\ x_3 &= F^{-1}_{3|2:x_2}(F^{-1}_{3|12:F_{1|2}(x_1)}(u_3)) \\ x_4 &= F^{-1}_{4|3:x_3}(F^{-1}_{4|23:F_{2|3}(x_2)}(F^{-1}_{4|123:F_{1|23}(x_1)}(u_4))) \end{align}

We can simply repeat this to arbitrarily high dimension. A similar approach exists for regular vines but we need to be more careful about the labelling and ordering of variables.

Acceptance-Rejection Method

From before recall that:

$$c(x_1,...,x_N) = \prod_{e \in E(\mathcal{V})} c_{e_1,e_2 | D_e} ( F_{e_1 | D_e}(x_{e_1}) , F_{e_2| D_e}(x_{e_2}))$$

We can use this in an acceptance-rejection scheme in the usual way: we generate multi-variate uniform variates and then accept/reject samples based on this density above. This is no different from what we have seen before and the usual observations about acceptance rates, etc. still apply.

We can see that while the specification of the vine copulae are slightly different to the copulae we looked at previously we can see that we can sample using the same tools as before.

Generic Sampling Method

We now move onto looking at sampling from vine copulas with general dimensions. We will present these in pseudocode. In all examples the "output" will be the variables $(u_1,...,u_N)$ which are multi-variate uniform with the vine copula induced dependency. Other interim values will be calculated.


Sampling from the C-vine

  1. Generate $w_1,...,w_N$ iid $U[0,1]$ variates
  2. Set: $u_1 = w_1$
  3. Set: $u_2 = C^{-1}_{2|1}(w_2 |w_1)$
  4. For $i > 2$:
    1. Set $t=w_i$
    2. For $j = i - 1, i-2,...,1$:
      1. $t = C^{-1}_{i |j:1,...,j-1}(t|w_j)$
    3. $u_i = t$
  5. Return: $(u_1,...,u_N)$


Sampling from the D-vine

  1. Generate $w_1,...,w_N$ iid $U[0,1]$ variates. Initialize $NxN$ matrices $A=(a_{i,j})$ and $B(b_{i,j})$
  2. Set: $u_1 = a_{1,1} = b_{1,1} = w_1$
  3. For $i \geq 2$:
    1. $a_{i,1} = w_i$
    2. For $j=2...i$:
      1. $a_{i,j} = C^{-1}_{i |j-1:j,...,i-1}(a_{i,j-1}|b{i-1,j-1})$
    3. ui = b{i,i} = a_{i,i}
    4. For $j=i-1,..,1$:
      1. $b_{i,j} =C_{j|i:j+1,...,i-1}(b_{i-1,j}|b{i,j+1})$
  4. Return $(u_1,...,u_N)$

Sampling from an R-vine requires a little more work in labelling the variables due to the lack of structure. We use indexing:

$$ 12, k_{3,1}3; k_{3,2}3|k_{3,1}; ... ; k_{i,1}i; k_{i,2}i | k_{i,1}; ... ; k_{i,i-1}i|k_{i1}; ... ; k_{n,1}n' ...; k_{n,n-1}n|k_{n,1}; ... k_{n,n-2} $$

With: $(k_{i,1}, ... , k_{i,i-1})$ some pemutation on $(1,...,i-1)$ (e.g. C-vine $k_{i,j} =j$ and D-vine $k_{i,j} = i-j$). With this in mind we can move on to general R-vine sampling.


Sampling from the R-vine

  1. Generate $w_1,...,w_N$ iid $U[0,1]$ variates. Initialize $NxN$ matrices $A=(a_{i,j})$ and $B(b_{i,j})$
  2. Set: $u_1 = a_{1,1} = b_{1,1} = w_1$
  3. For $i \geq 2$:
    1. $M= \{ k_{i,1}, ... , k_{i,i-2} \}$
    2. $v_{i,k_{i,i-1},M} = v_{i, \{ k_{i,i-1} \cup M \}} = w_i$
    3. $a_{i,1} = w_i$
    4. For $j \geq 2$
      1. $M= \{ k_{i,1}, ... , k_{i,i-j} \}$
      2. $v_{i,M} = C^{-1}_{i |k_{i,i+1-j}:M}(a_{i,j-1}|v_{i,i+1-j,M})$
      3. $a_{i,j} = v_{i,M}$
    5. $u_i = a_{i,i}$
    6. $b_{i,i} = a_{i,i}$
    7. For $j=i-1,...1$:
      1. $M= \{ k_{i,1}, ... , k_{i,i-1-j} \}$
      2. $v_{k_{i,i-j},i,M} = C_{k_{i,i-j}|i:M}(v_{k_{i,i-j},M}|a_{i,j+1})$
      3. $b_{i,j} = v_{k_{i,i-j},i,M}$
  4. Return $(u_1,...,u_N)$

Vine-Copula packages

Thankfully there are pre-canned packages that we can use to sample from vines (as well as performing calibration procedures and other things we have not looked at in this blog). The R language perhaps has the most widely used (given it is used by statistiticians) in the "VineCopula" package. In python we have the following: pyvine, pyvinecopulalib and others.

Conclusion

In this blog post we have seen a gentle introduction to the concept of vine copulae, what they are, how to classify them and how to sample from them. In general vine copulae are useful in modelling high dimension multi-variate systems. They tend to perform best (in this authors experience) when the vine structure is sparse (i.e. lots of pairwise relations are independent) and on occasion very tail dependent behaviour can cause instability (although this is also true of other copula methods also). Although we have not covered the process of calibrating vine copulae there are a lot of powerful methods available to us in order to do this which is very useful. Vine copulae are increasingly used as a way of simulating "synthetic data" in order to feed the ever more data-hungry deep machine learning methodologies.


References

The homepage for all things vine copula is: homepage

In preparing this blog post I relied heavily on the book:

Dependence Modelling: Vine Copula Handbook - Dorota Kurowicka and Harry Joe