Taleb: Silent Risk, Section 1.4.4 Mean Deviation vs Standard Deviation

We are going to play around with a mixture distribution made up of a large proportion of ~N(0, 1) and a small proportion of ~N(0, 1+a). The wider distribution is "polluting" the standard normal distribution. We are going to see that mean absolute deviation is a more efficient estimator of the distribution's dispersion than standard deviation. We are also going to see some unexpected (by me, at least) behavior in response to the level of pollution of the base distribution, the dispersion of the polluting distribution, and the sample size.

R Utility for Mixture Distribution

Here's a useful little utility for producing mixture distributions in R. I needed a basic version of this for investigating some of Taleb's ideas, and decided to put together an easy-to-use general version. Here's the code followed by some instructions:

CTA fee Structure

It is well known that there is increasing pressure on CTA fees (google interviews with David Harding , Cliff Asness, etc.). We see this in the proliferation of smart beta products offered with relatively low fees. Institutions don't want to pay for beta. Putting aside what beta means in the CTA world, let's look at what their options are for fees.

I have noticed CTAs taking one of two different routes:
  • A slimmed down program with management fees only or low incentive fees (for example, Altis has just launched a 1% management fee only, "Pure Trend" program, or Cantab's "Core Macro" program offered at 1/2 & 10)
  • Incentive fee only programs (for example, Dunn has charged 0 & 25 for a long time, QIM is offering 0 & 30 via Kettera Strategies Hydra Platform).
I suspect there is a temptation for a lot of new CTAs to offer incentive fee only programs too, but I don't have any data to support this.

The rationale offered for incentive fees in general is the notion that the client is paying for what they want the most: performance. It puts the client and the trader on the same side. Offering an incentive fee only structure takes this argument to its logical extreme: the client only pays for performance. But is this really true?

How Different Are These Things From One Another (Category & Mixed Data)?

In an earlier post I was looking at distance measures for clustering. In a still earlier post I had referred to analyzing hedge fund regulatory data using clustering to try to put the funds into groups by inferred strategy. I had to solve a problem with clustering that has being bothering me for a while: how do you measure distances between observations when the data is sparse? In my case the problem is further compounded by order-of-magnitude differences in the values for one observation vs. another (a Pareto distribution).


I have downloaded from SEC's IAPD website and from NFAs BASIC website a lot of information about the funds operated by 70 of the largest hedge funds according to the 2015 version of Institutional Investors Alpha Hedge Fund 100. My hypothesis is simple: Managers adopting similar market strategies (as distinct from trading strategies) will tend to offer similar funds in the marketplace and use similar names for them.

I dismantle all the names of the funds to create a dictionary of "fund words". This is harder than it sounds - there is a ton of clean-up to do including filtering out meaningless words like brand names, numbers, forms of organization, etc., not to mention the outrageous number of spelling mistakes! For each manager I count up the number of times a word appears and also total up the $AUM associated with each word based on the AUM in the fund that uses the word.

For example, if Anchorage reports 6 funds with $2.8bn AUM with the word "CLO" in their names, 13 funds with $1bn AUM with the word "Credit", etc. After all the filtering, Anchorage only ends up with about 13 meaningful words in its vocabulary.

My overall dictionary across all managers in my base case includes 512 words (my cases range from 50 - 1800 words). So you can see that it is likely that two managers might share only a few words in common. An alternative problem is that the managers have very similar vocabularies, but one may have an order of magnitude more $AUM associated with the same words. Using traditional distance measures like Manhattan or Euclidean will be dominated by the lack of overlap or the sheer overall AUM differences between them. This is the problem I have sought to solve.

I have come up with an approach that appeals to me, and I want to share it. First, I want to look at how we measure distance between observations when the data are categorical. Then I want to show how I think the categorical approach can be combined with numerical data that is sparse.

Inverse Totient Procedure

When I have time, I enjoy solving the problems at Project Euler. I have solved 177 problems as of today using R as my primary tool. In fact, I found Project Euler when I was looking for problems I could use to learn R. At one point I was third ranked by problems solved in the R listings, but I have since slipped to sixth - the takeaway is that the R-crew are not the cream of the crop on Project Euler!

Leonhard Euler came up with the Totient function (the Totient of n is the number of integers less than n coprime to n). Not surprisingly, totients feature in a number of the Project Euler problems. One I have been struggling with involves inverting the totient function. This is not straightforward because for any given totient, there are at least two numbers that could have given rise to it. For example, 3, 4, and 6 all have a totient of 2 (1,2 are coprime to 3; 1, 3 are coprime to 4; 1, 5 are coprime to 6).

I have searched all the usual places for a procedure to invert the totient function. I found answers to specific questions (i.e. if the totient of n is 1000 what is n?). I found academic papers that provide procedures, but I couldn't find a nice simple recipe. So based on what I found, here's one ...
Get widget