R Code for Correlation Matrix and Heatmap

In response to a comment, here is the R code used to generate the data and the heat maps in the preceding post.  Lots of explanatory comments added.

Apologies in advance for the layout issues - I can never seem to get these boxes to display right!

# Input variables

path <- "C:/ ... /Futures/"
sub.contracts <- "Contracts/"
sample.size <- 500 # how many pairs to use in a sample
sample.interval <- 250 # how many bars between each test


# process inputs, load libraries

library(gplots)


# Get some data

# list of filenames of contract data and user-friendly names.
contract.data <- read.csv(paste(path, sub.contracts, "ContractNames.txt", sep=""),
header=F, stringsAsFactors=F)

# Set up data table and fill first column

# read in first contract (date, OHLC, vol, OI, rawClose)
next.instrument <- read.csv(paste(path, contract.data[1,1], sep=""), header=F)
# set up a matrix, name the rows (dates) and the columns (friendly names)
instrument.closes <- matrix(NA , nrow=length(next.instrument[ , 5]), ncol=dim(contract.data)[1],
  dimnames=list(next.instrument[ , 1], contract.data[, 2]))
# load the closes for first contract in the first column
instrument.closes[ , 1] <- next.instrument [ , 5]


# cycle through remainng contracts and add them to table

# for each contract
for (j in 2:dim(contract.data)[1]){
  # read in the next instrument contract data
  next.instrument <- read.csv(paste(path, contract.data[j, 1], sep=""), header=F)
  # keep only rows for which a date is present in the new contract
  instrument.closes <-
    instrument.closes[rownames(instrument.closes) %in% next.instrument[ , 1], ]
  # keep only rows for which a date is present in the existing contract matrix
  instrument.closes[ , j] <-
    next.instrument[next.instrument[ , 1] %in% rownames(instrument.closes), 5]
}

# Divide the closes less the first row by the closes less the last row
  and take logs (to get daily % changes cont. comp.)
log.diffs <- log(instrument.closes[-1, ] / instrument.closes[-dim(instrument.closes)[1], ])
# Number of samples to analyze
samples <- ((dim(log.diffs)[1] - sample.size) %/% sample.interval) + 1

for (i in 1:samples){
  # start index for each block of data
  start <- dim(log.diffs)[1] - (samples - i) * sample.interval - sample.size + 1
  # end index
  end <- start + sample.size - 1
  # calculate the correlation matrix
  cor.log.diffs <- cor(log.diffs[start:end, ])
  # open a new graphics device
  dev.new()
  # a chart title
  map.title <- paste(sample.size, " Bars: ", rownames(log.diffs)[start], " - ", rownames(log.diffs)[end], sep="")
  # display the heatmap
  heatmap.2(cor.log.diffs, trace="none", dendrogram="none", margins=c(8, 8), main=map.title)
  # Wait for the user to hit "enter"
  browser()
}

1 comment:

Jez Liberty said...

Cool - thanks for posting this. A bit in a rush right now but will check it out more thoroughly at the weekend..

Post a Comment

Note: Only a member of this blog may post a comment.

Get widget