I will start by looking at an extremely simple hypothetical to get a sense of the mechanics of the algorithm. Then I will set up a test case using the same surface as in my previous post. I will set up a base case set of algorithm tuning parameters and see what happens as each parameter value is varied.
I want to arrive at some guidelines for choosing the main tuning parameters for the algorithm. Here they are:
- hms - harmony memory size.
- hmcr - harmony memory change rate
- par - pitch adjustment rate
- fw - fret width*
* Note: my fret width means "percent of range of the variable in question". If xi has a range (defined by me as the search space) from xi, min = 10 to xi, max = 20 and fw = 25%, then the pitch adjustment will be a value picked at random from the range +/- 1.25.
Consider HM with with hms members. The diagram shows the ith dimension of the search space and all but the jth member of HM removed. So we see only xij which is selected as the "parent" with a probability of 1 / hms. The diagram shows the different ways in which a new candidate for the ith dimension, (..., xi', ...) could be generated.
Edit: in the diagram above, x1 and fw1 should read xi and fwi.
When I use HS I want to ensure that the entire search space has some chance of being included. This means not only that I want a chance of seeing anywhere from xi, min to xi, max for each xi, but also every combination of values of x1, x2, etc. The diagram hints at how changes in hmcr, par and fw will affect the search process.
When I refer to the "base case" I mean hms = 16, hmcr = 75%, par = 25%, fw = 25%. The search is terminated when a quorum of 75% of the members of HM (i.e. 12 of 16) agree to within 2 significant figures on the location (not the value) of the maximum or after 5,000 cycles. I run each test 50 times so that if there are no failures (finding a local maximum not the global) I can be 90% confident that the failure rate for that HS set-up is less than 0.5%.
I judge the performance of the HS based on the highest maximum value in HM after the search terminated for EACH of the 50 trials; I refer to this as the "Best" value. I look at the median, maximum, average and standard deviation of Best across all 50 trials. I also look at the number of cycles taken to reach the end of the search and the number of failures in each case.
The global solution for the test case is x = 0.184, y = 0.350 => z = 0.9169. The base case for example, gave a max Best = 0.9169 with a median of 0.9166 and a mean of 0.9107 +/- 0.0302 over 50 trials. Solution took an average of 1,263 cycles, ranging from 442 - 2,877. There were 2 failures - finding the second-highest maximum which you can pick out on the other side of the origin.
Harmony Memory Size - hmsObviously hms >= 1. I want a reasonable size harmony memory so that the odds of getting a quorum around the wrong maximum are relatively low - a quorum of 1 isn't useful. I found that using hms = 2 (1 for each dimension of the search space) rather than 16 lead to 13 "failures" in 50 trials. With as few as hms = 4, adjusting hmcr to 50% I was able to get a median Best of 0.9168 and a mean Best of 0.9166 with no failures in 50 trails. The process took 2,225 cycles on average with a range of 128 - 5,000 cycles. So a tentative rule of thumb might be 2 x number of dimensions or 2 ^ dimensions, maybe eve 1 x dimensions but not less than .. some lower bound. Further exploration of more complicated set-ups is needed.
To use a "genetic" analogy, small populations lead to in-breeding and evolutionary dead-ends (failures). Larger starting populations seed the space better leading to more consistent and better results.
harmony memory change rate - hmcrReferring to the top diagram, the role of hmcr is to control the rate at which the search is seeded with completely random values rather than "breeding" from an existing value. The lower hmcr the more often a random value is thrown into the mix and the more thoroughly the optimization space is explored making it less likely the wrong maximum will be reached. The trade-off is that once the HM is on the track of the global optimum, cycles spent randomly hitting the rest of the search space are essentially "wasted" cycles. It seems as though some intelligence regarding the state of the search could be used to make the algorithm more efficient.
If hmcr = 0% then the search is essentially random. In testing with 0%, every one of 50 trials ran to 5,000 cycles but the true maximum was located in each case. With hmcr = 100% (i.e. no random searching other than when HM is first set up) there were 6 failures. The failure count dropped as hmcr was reduced reaching zero at hmcr = 50%. Note in the 2-dimensional case, if hmcr = 50%, then every 4th candidate solution is totally random and 75% have at least one random value (vs 12.5% and 44% for the base case). As expected, decreasing hmcr resulted in increasing cycles; hmcr 50% (vs 75% in the base case) increased base case cycles from 1,263 to 1,944, but there were no failures.
If you are working with d dimensions, then (1 - hmcr^d) candidates will have at least one totally random component. This suggests that as d increases, hmcr also should increase to avoid flailing around in randomness.
Pitch Adjustment Rate - parThe role of par is to determine how often, given we are going to pull from the existing population, we accept the "gene" as is and how often we are going to "mutate" it.
In my testing I found an odd behavior: As par increased, the failure count decreased (the algorithm was more effective), but the cycle counts decreased at first and then increased. I think this is because my base case uses hmcr = 75%, so par is actually performing an important role in "jumping" from one evolutionary path (up the wrong maximum) to the right one. This is not really what it is designed for: it should be "fine tuning" the solution as it homes in on the maximum.
The lowest cycle count occurred at par = 25%
Fret Width - fwThe larger fw, the more par / fw function like (1 - hmcr). Obviously, if fw is 100% then the adjustment to xi is sampled from the same space as the random value for xi shown in the first box in the diagram above. As you should by now expect, as fw goes up the failure count goes down and the cycle count goes up.
I suspect that fw needs to be wide enough to jump across areas of roughness in the surface so the algorithm does not get stuck on a minor peak that is on the slopes of the major peak. Perhaps some exploration of selected areas of the surface might be useful to guide in the selection of fw. Some intelligence might be employed in dialing fw down as the algorithm homes in on the optimum if you must find the absolute pinnacle!
ConclusionsThink of hmcr as doing the heavy lifting of getting the algorithm on the track of the right maximum, while par / fw home in on the peak of the objective function. In evolutionary terms, hmcr enables competition with candidates climbing other potential maxima while par / fw enables competition between candidates climbing the same maximum.
It seems like many combinations of hms, hmcr, par and fw can achieve similar performance. Since, for system trading, we are interested in broad or robust optima perhaps we should favor values that allow for broad exploration and large jumps
The approach I am going to take is to create a test case with the same dimension count (and types: discrete / continuous) as the problem I intend to solve and use the test case to identify "good" parameters to use for the HS algorithm.
I believe I have figured out a way to implement this process directly in Trading Blocks Builder - something to blog about later.
Another issue I must address is at what stage to apply the Leverage Space Trading Model. I think it needs to be inside the optimization process, not on top of it.