Random Correlated Series Generator

It is often a challenge to de-bug code that involves large numbers of long stochastic series - it is very easy to think you have it right and not so easy to make sure. Lately I have needed to generate random correlated series whose means and covariance characteristics I know so I can verify various calculation procedures. I thought I would share a small function I wrote in R that generates the series.

I wanted to be able to provide a correlation matrix together with a set of means and standard deviations and get back a set of series as the columns of a matrix so many rows long.

The basic procedure is as follows:
  1. provide the following inputs:
    • The required series length (i.e. number of periods) defaulting to 1,000.
    • The correlation matrix defaulting to a 1 x 1 identity matrix.
    • A vector of means, one for each series, defaulting to 0.
    • A vector of standard deviations, one for each series, defaulting to 1.
  2. Check the inputs make sense or are interpretable in some meaningful way.
  3. Figure out from the dimensions of the correlation matrix how many series are requested.
  4. Build matrices for both the means and standard deviations that matches the series that will be generated (i.e. series length rows x number of series columns.
  5. Create a matrix filled with random numbers ~N(0,1) with series length rows and number of series columns.
  6. Complete a Cholesky decomposition of the correlation matrix and multiply (using matrix multiplication) by the random number matrix.
  7. This matrix, which has the desired correlation characteristics, is then scaled by the desired standard deviations and shifted by the desired means to get the final set of series to be returned.

Here's the code:

# Ian Rayner: 2011-02-07

# Generates a set (1, or dimension of user-supplied correlation matrix) of
# series of length len.series. If no mean or sd vectors are supplied, they
# are assumed to be 0 and 1 respectively.
# Checks correlation matrix is symmetric and all values are between [-1, +1].
# Checks that the means and sds vectors, if supplied match the dimensions
# of the correlation matrix.


CorrelatedRandomSeries <- function(len.series=1000, cor.matrix=matrix(1,nrow=1), means=0, sds=1){

# Input validation
  if (!identical(cor.matrix, t(cor.matrix)))
    stop("Correlation matrix should be symmetric")
  if ((max(cor.matrix) > 1) | (min(cor.matrix) < -1))
    stop("All values in correlation matrix must be on [-1, +1]")
  if ((length(means) != dim(cor.matrix)[1]) & (length(means) != 1))
    stop("Means must match correlation matrix dimensions")
  if ((length(sds) != dim(cor.matrix)[1]) & (length(sds) != 1))
    stop("Standard deviations must match correlation matrix dimensions")

# Infer number of series from correlation matrix.
  num.series <- dim(cor.matrix)[1]

# If single valued, simply repeat for entire series.
  means <- matrix(means, nrow=len.series, ncol=num.series, byrow=T)
  sds <- matrix(sds, nrow=len.series, ncol=num.series, byrow=T)

# Create random series ~N(0,1)
  series <- matrix(rnorm(num.series * len.series, 0, 1), ncol=num.series)

# Use Cholesky decomposition to build correlation then scale and shift by means / sds
  return(sds*(series %*% chol(cor.matrix)) + means)
}


Aside: Cholesky Decomposition

I searched high and low for proof that the Cholesky decomposition of a n x n correlation matrix (Rho) transforms an array of n random vectors (~N(0,1)) into a set of vectors with the correlation coefficients specified in Rho. I couldn't find one, so here's my version of the proof:

\boldsymbol{\rho } \textup{ is a correlation matrix with elements } \rho_{i,j}

\textup{ \textbf{L} is a lower triangular matrix with elements L}_{i,j} \textup{, such that } \boldsymbol{\rho } \textup{ = \textbf{LL}}^{T}

\textup{i.e. \textbf{L} is the Cholesky decomposition of }\boldsymbol{\rho }

\begin{pmatrix}
\rho _{1,1} & \rho _{1,2} & . & \rho _{n,1}\\
\rho _{2,1} & \rho _{2,2} & . & \rho _{n,2}\\
. & . & . & .\\
\rho _{n,1} & \rho _{n,2} & . & \rho _{n,n}
\end{pmatrix} = \begin{pmatrix}
L_{1,1} & 0 & . & 0\\
L_{2,1} & L_{2,2} & . & 0\\
. & . & . & .\\
L_{n,1} & L_{n,2} & . & L_{n,n}
\end{pmatrix}\begin{pmatrix}
L_{1,1} & L_{2,1} & . & L_{n,1}\\
0 & L_{2,2} & . & L_{n,2}\\
. & . & . & .\\
0 & 0 & . & L_{n,n}
\end{pmatrix}


\textup{ =} \begin{pmatrix}
L_{1,1}.L_{1,1} & L_{1,1}.L_{2,1} & . & L_{1,1}.L_{n,1}\\
L_{2,1}.L_{1,1} & L_{2,1}.L_{2,1} + L_{2,2}.L_{2,2} & . & L_{2,1}.L_{n,1} + L_{2,2}.L_{n,2}\\
. & . & . & .\\
L_{n,1}.L_{1,1} & L_{n,1}.L_{2,1} + L_{n,2}.L_{2,2} & . & L_{n,1}.L_{n,1} + L_{n,2}.L_{n,2} + . + L_{n,n}.L_{n,n}
\end{pmatrix}


\textup{Summarizing: } \rho _{i,j} = \rho _{j ,i} = \sum_{k = 1}^{k = i}L_{i,k}L_{j,k} \textup{ } \forall \textup{ } i\leq j


\textbf{Z} \textup{ is a column vector of independent random variables } \sim \textit{N}\left ( 0, 1 \right )

\textup{The ith element of \textbf{Z} is Z}_{i}

\textup{Define \textbf{X} = \textbf{LZ}. The ith element of \textbf{X} is X}_{i}


X_{i} = L_{i,1}Z_{1} + L_{i,2}Z_{2} + . + L_{i,i}Z_{i}

X_{j} = L_{j,1}Z_{1} + L_{j,2}Z_{2} + . + L_{j,j}Z_{j}


\textup{The correlation coefficient of } X_{i} \textup{ and } X_{j} \textup{, } i\leq j \textup{, is}

\rho _{i,j} = X_{i}.X_{j} \textup{ and } Z_{i}.Z_{j} = \delta _{i,j} \textup{ (because Z}_{i} \textup{ are uncorrelated with variance of 1) therefore, }

\rho _{i,j} = L_{i,1}L_{j,1} + L_{i,2}L_{j,2} + . + L_{i,i}L_{i,i} = \sum_{k = 1}^{k = i}L_{i,k}L_{j,k}


which is the same as above!

Edit: Fixed LaTeX rendering problem.

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Get widget