Data Science: Shrinking Big Data into Meaningful Data

In this post I explain how less is more when it comes to using “big data.”  The best data is concise, meaningful, and actionable. It is both an art and a science to turn large, complex data sets into meaningful, useful information. Just like the later paintings of Monet capture the impression of beauty more effectively than a mere photograph, “small data” can help make sense of “big data.”

Monet Painting of the sun through the fog in London

Claude Monet, London

There is beauty in simplicity, but capturing simplicity is not simple. A young child’s drawings are simple too, but they very unlikely to capture light and mood like Monet did.

Worry not. There will be finance and math, but I will save the math for last, in an attempt to retain the interest of non “mathy” readers.

The point of discussing impressionist painting is show that reduction — taking things away — can be a powerful tool.  In fact, filtering out “noise” is both useful and difficult. A great artist can filter out the noise without losing the fidelity of the signal.  In this case, the “signal” is emotion and color and light as as perceived by a master painter’s mind.

 

Applying Impressionism to Finance

Massive amounts of data are available to the financial professional. Two questions I have been asking at Sigma1 since the beginning are 1) How to use “Big Compute” to crunch that data into better portfolios? 2) How to represent that data to humans — both investment pros and lay folk whose money is being invested?  After considerable thought, brainstorming, listening, and learning, I think we are beginning to construct a preliminary picture of how to do that — literally.

Portfolio Asset Relationships

Relationships between Portfolio Assets

While not a beautiful as a Monet painting, the picture above is worth a thousand words (and likely many thousands of dollars over time) to me.  The assets above constitute all of the current non-CASH building blocks of my personal retirement portfolio.  While simple, the above image took considerable software development effort and literally millions of computations to generate [millions is very do-able with computers].

This simple-looking image conveys complex information in an easy-to-understand form. The four colors — red, green, blue, and purple — convey four asset types: fixed income, US stocks, international stocks, and convertible securities. The angle between any two asset lines conveys the relative correlation between the pair.  In portfolio construction larger angles are better.  Finally the length of the line represents the “effectiveness” with which each asset represents its “angular position” within the portfolio (in addition to other information).

With Powerful Data, First Comes Humility, Next Comes Insight

I have applied the same visualizations to other portfolios, and I see that, according to my software, many of the assets in professionally-managed portfolios exhibit superior “robustness” to my own.  As someone who prides myself in having a kick-ass portfolio, this information is humbling, and took some time to absorb from an ego standpoint.  But, having gotten over it, I now see potential.

I have seen portfolios that have a significantly wider angle than my current portfolio.  What does this mean to me?  It means I will begin looking for assets to augment my personal portfolio.  Before I do that let me share some other insights. The plot combines covariance matrix data for the 16 assets in the portfolio, as well as semi-variance data for each asset.  Without getting to “mathy” yet, the data visualization software reduces 136 pieces of data down to 32 (excluding color). The covariance matrix and semi-variance calculation itself are also a reducers in that they combines 5 years monthly total-return data — 976 data points down to 120 unique covariance numbers and 16 semi-deviation numbers. Taking 976 down to 32 results in a compression ratio of 30.5:1.

Finally, as it currently stands, the visualization software and resulting plot say nothing about expected return.  The plot focuses solely on risk mitigation at the moment.  Naturally, I intend to change that.

Time for the Math and Finance — Consider Yourself Warned

I mentioned a 30.5:2 (71:2) compression ratio. Just as music and other data, other information, including financial information can be compressed.  However, only so much compression can be achieved in lossless manner.  In audio compression researchers have learned which portions of music and other audio can be “lost” without the listener telling the difference.  There is a field of psychoacoustics around doing just that — modeling what the human ear (and brain) can hear, and what gets “masked” by various physiological factors.

Even more important that preserving fidelity is extracting meaning. One way of achieving that is by removing “noise.” The visualization software performs significant computation to maintain as much angular fidelity as possible. As it optimizes angles, it keeps track of total error vis-a-vis the covariance matrix. It also keeps track of individual assets error (the reciprocal of fitness — fit versus lack of fit).

The real alchemy comes from the line-length computation.  It combines semi-variance data with various fitness factors to determine each asset line length.

Just like Mercator projections for maps incur unavoidable error when converting from a 3-D globe to a 2-D map, the portfolio asset visualizations introduce error as well.  If one thinks of just the correlation matrix and semi-variance data, each asset has a dimensionality of 8.5 (in the case of 16 assets).  Reducing from 8.5-D to 2-D is a complex process, and there are an infinite number of ways to perform such an operation!  The art and [data] science is to enhance the “signal” while stripping away the “noise.”

The ultimate goals of portfolio data visualization technology are:

1) Transform raw data into actionable insight

2) Preserve sufficient fidelity of relevant data such that the “map” can be used to reliably get to the desired “destination”

I believe that the first goal has been achieved.  I know what actions to take… trying various other securities to find those that can build a “higher-angle”, and arguably more robust, more resilient investment portfolio.

However, the jury is still out on the degree [no pun intended] to which goal #2 has or has not been achieved.  Does this simple 2-D map help portfolio builders reliably and consistently navigate the 8+ dimensional portfolio space?

What about 3-D Modelling and Visualization?

I started working with 2-D for one key reason — I can easily share 2-D images with readers and clients alike.  I want feedback on what people like and dislike about the visuals. What is easy to understand, what is not?  What is useful to them, and what isn’t?  Ironing out those details in 2-D is step 1.

Of course I am excited by 3-D. Most of the building blocks are in my head, and I can heavily leverage the 2-D algorithms.  I am, however, holding off for now. I am waiting for feedback from readers and clients alike.  I spend a lot of time immersed in the language of math, statistics, and finance.  This can create a communication gap that is best mitigated through discussion with other people with other perspectives.  I wish to focus on 2-D for a while to learn more about market needs.

That being said, it is hard to resist creating a 3-D portfolio asset visualizer. The geek in me is extremely curious about how much the error terms will reduce when given a third degree of freedom to work with.

The bottom line is: Please give me any feedback: positive, negative, technical, aesthetic, etc. This is just the start. I am extremely enthusiastic about where this journey will take me and my company.

Disclosure and Disclaimer

Securities mentioned in this post are holdings in my personal retirement accounts (e.g. 401K, IRA, Roth IRA) as of the day of initial publication of this post. The purpose of this post is to illustrate features of Sigma1 Financial software. This is NOT investment advice, and NOT a recommendation to buy, sell, or hold any securities. Please refer to the “Disclaimer” Tab of the main page of this site for further information.

Advertisement

Choosing your Crystal Ball for Risk

Choose Your “Perfect” Risk Model

I start with a hypothetical.  You are considering between three portfolios A, B, and C.  If you could know with certainty one of the following annual risk measures, which would you choose:

  1. Variance
  2. Semi-variance
  3. Max Drawdown

For me the choice is obvious: max drawdown. Variance and semi-variance are deliberately decoupled from return.  In fact, we often say variance as short-hand for mean-return variance. Similarly, semi-variance is short-hand for mean-return semi-variance. For each variance flavor, mean-returns — average returns — are subtracted from the risk formula.  The mathematical bifurcation of risk and return is deliberate.

Max drawdown blends return and risk. This is mathematically untidy — max drawdown and return are non-orthogonal. However, the crystal ball of max drawdown allows choosing the “best” portfolio because it puts a floor on loss.  Tautologically the annual loss cannot exceed the annual max drawdown.

Cheating Risk

My revised answer stretches the rules.  If all three portfolios have future max drawdowns of less than 5 percent, then I’d like to know the semi-variances.

Of course there are no infallible crystal balls.  Such choices are only hypothetical.

Past variance tends to be reasonably predictive of future variance; past semi-variance tends to predict future semi-variance to a similar degree.  However, I have not seen data about the relationship between past and future drawdowns.

Research Opportunities Regarding Max Drawdown

It turns out that there are complications unique to max drawdown minimization that are not present with MVO or semi-variance optimization. However, at Sigma1, we have found some intriguing ways around those early obstacles.

That said, there are other interesting observations about max drawdown optimization:

1) Max drawdown only considers the worst drawdown period; all other risk data is ignored.

2) Unlike V or SV optimization, longer historical periods increase the max drawdown percentage.

3) There is a scarcity of evidence of the degree (or lack) of relationship between past max drawdowns and future.

(#1) can possibly be addressed by using hybrid risk measures such as combined semi-variance and max drawdown measures. (#2) can be addressed by standardizing max drawdowns… a simple standardization would be DDnorm = DD/num_years.  Another possibility is DDnorm = DD/sqrt(num_years). (#3) Requires research. Research across different time periods, different countries, different market caps, etc.

Also note that drawdown has many alternative flavors — cumulative drawdown, weighted cumulative drawdown (WCDD), weighted cumulative drawdown over threshold — just to name three.

Semi-Variance Risk Measure Reaching Critical Mass?

The bottom line is that early adopters have embraced semi-variance based optimization and the trend appears to be snowballing.  For instance, Morningstar now calculates riskwith an emphasis on downward variation.”  I believe that drawdown measures, either stand-alone or hybridized with semi-variance, are the future of post post modern portfolio theory.

Bye PMPT. Time for a Better Name! Contemporary Portfolio Theory?

I recommend starting with the the acronym first.  I propose CPT or CAPT.  Either could be pronounced as “Capped”. However, CAPT could also be pronounced “Cap T” as distinct from CAPM (“Cap M”). “C” could stand for either Contemporary or Current.  And the “A” — Advanced, Alternative — with the first being a bit pretentious, and the latter being more diplomatic. I put my two cents behind CAPT, pronounced “Cap T”; You can figure out what you want the letters to represent.  What is your 2 cents?  Please leave a comment!

Back to (Contemporary) Risk Measures

I see semi-variance beginning to transition from the early-adopter phase to the early-majority phase. However, my observations may be skewed by the types of interactions Sigma1 Financial invites. I believe that semi-variance optimization will be mainstream in 5 years or less. That is plenty of time for semi-variance optimization companies to flourish. However, we’re also looking for the next next big thing in finance.

 

Semi-variance: Choosing the Best Formula

Unlike variance, there a several different formulas for semivariance (SV).  If you are a college student looking to get the “right” answer on test or quiz, the formula you are looking for is most likely:

Classic Semi-Variance Formula

Classic Semi-Variance Formula

The question-mark-colon syntax simply means if the expression before the “?” is true then the term before the “:” is used, otherwise the term after the “:” is used.  So a?b:c simply means chose b if a is true, else chose c.  This syntax is widely used in computer science, but less often in the math department.  However, I find it more concise than other formulations.

Another common semivariance formula involves comparing returns to a required minimum threshold rt.  This is simply:

Min Return Threshold SV

Min Return Threshold SV

Classic mean-return semivariance should not be directly compared to mean-return variance.  However a slight modification makes direct comparison more meaningful.  In general approximately half of mean-adjusted returns are positive and half are negative (exactly zero is a relatively rare event and has no impact to either formula).  While mean-variance always has n terms, semi-variance only uses a subset which is typically of size n/2.  Thus including a factor of 2 in the formula makes intuitive sense:

Modified Semi-Variance

Modified Semi-Variance

Finally, another useful formulation is one I call “Modified Drawdown Only” (MDO) semivariance.  The name is self-explanatory… only drawdown events are counted.  SVmdo does not require ravg (r bar) nor rt.  It produces nearly identical values to SVmod for rapid sampling (say for anything more frequent than daily data).  For high-speed trading it also has the advantage of not requiring all of the return data a priori, meaning it can be computed as each return data point becomes available, rather than retrospectively.

Modified Drawdown-Only Semi-variance

Modified Drawdown-Only Semi-variance

Why might  SVmdo be useful in high-speed trading?  One use may be in put/call option pricing arbitrage strategies.  Black–Scholes, to my knowledge, makes no distinction between “up-side” and “down-side” variance, and simply uses plain variance. [Please shout a comment at me if I am mistaken!]    However if put and call options are “correctly” priced according to Black–Scholes, but the data shows a pattern of, say, greater downside variance than normal variance on the underlying security, put options may be undervalued.  This is just an off-the-cuff example, but it illustrates a potential situation for which SVmdo is best suited.

Pick Your Favorite Risk Measure

Personally, I slightly favor SVmdo over SVmod for computational reasons. They are often quite similar in practice, especially when used to rank risk profiles of a set of candidate portfolios. (The fact that both are anagrams of each other is deliberate.)

I realize that the inclusion of the factor 2 is really just a semantic choice.  Since V and (classic) SV, amortized over many data sets, are expected to differ by a factor of 2, standard deviation, σ,  and semideviation, σd, can be expected to differ by the square root of 2.  I consider this mathematically untidy.  Conversely, I consider SVmod to be the most elegant formulation.

The Best Financial Models for Insight and Prediction?

The best models are not the models that fit past data the best, they are the models that predict new data the best. This seems obvious, but a surprising number of business and financial decisions are based on best-fit of past data, with no idea of how well they are expected to correctly model future data.

Instant Profit, or Too Good to be True?

For instance, a stock analyst reports to you that they have a secret recipe to make 70% annualized returns by simply trading KO (The Coca-Cola Company).  The analyst’s model tells what FOK limit price, y, to buy KO stock at each market open.  The stock is then always sold with a market order at the end of each trading day.

The analyst tells you that her model is based on three years of trading data for KO, PEP, the S&P 500 index, aluminum and corn spot prices.  Specifically, the analyst’s model uses closing data for the two preceding days, thus the model has 10 inputs.  Back testing of the model shows that it would have produced 70% annualized returns over the past three years, or a whooping 391% total return over that time period.  Moreover, the analyst points out that over 756 trading days 217 trades would have been executed, resulting in profit a 73% of the time (that the stock is bought).

The analyst, Debra, says that the trading algorithm is already coded, and U.S. markets open in 20 minutes. Instant profit is only moments away with a simple “yes.” What do you do with this information?

Choices, Chances, Risks and Rewards

You know this analyst and she has made your firm’s clients and proprietary trading desks a lot of money. However you also know that, while she is thorough and meticulous; she is also bold and aggressive. You decide that caution is called for, and allocate a modest $500,000 to the KO trading experiment.  If after three months, the KO experiment nets at least 7% profit, you’ll raise the risk pool to $2,000,000.  If, after another three months, the KO-experiment generates at least 7% again; you’ll raise the risk pool to $10,000,000 as well as letting your firms best clients in on the action.

Three months pass, and the KO-experiment produces good results: 17 trades, 13 winners, and a 10.3% net profit. You OK raising the risk pool to $2,000,000.  After only 2 months the KO-experiment has executed 13 trades, with 10 winners, and a 11.4% net profit.  There is a buzz around the office about the “knock-out cola trade”, and brokers are itching to get in on it with client funds. You are considering giving the green light to the “Full Monty,” when Stan the Statistician walks into your office.

Stan’s title is “Risk Manager”, but people around the office call him Stan the Statistician, or Stan the Stats Man, or worse (e.g. “Who is the SS going to s*** on today?”)  He’s actually a nice guy, but most folks consider him an interloper.  And Stan seems to have clout with corporate, and he has been known to use it to shut down trades. You actually like Stan, but you already know why he is stopping by.

Stan begins probing about the KO-trade.  He asks what you know.  You respond that Debra told you that the model has an R-squared of 0.92 based on 756 days of back-tested data.  “And now?” asks Stan.  You answer, “a 76% success rate, and profits of around 21% in 5 months.”  And then Stan asks, “What is the probability that that profit is essentially due to pure chance?”

You know that the S&P 500 historically has over 53% “up” days, call it 54% to be conservative. So stocks should follow suit.  To get exactly 23 wins on KO out of 30 tries is C(30, 23)*0.54^23*(0.46)^7 = 0.62%. To get at least 23 (23 or more wins) brings the percentage up to about 0.91%.  So you say 1/0.091 or about one in 110.

Stan says, “Your math is right, but your conclusion is wrong.  For one thing, KO is up 28% over the period, and has had 69% up days over that time.”  You interject, “Okay, wait one second… so my math now says about 23%, or about a 1 in 4.3 chance.”

Stan smiles, “You are getting much closer to the heart of the matter. I’ve gone over Debra’s original analysis, and have made some adjustments. My revised analysis shows that  there is a reasonable chance that her model captures some predictive insight that provides positive alpha.”  Stan’s expression turns more neutral, “However, the confidence intervals against the simple null hypothesis are not as high as I’d like to see for a big risk allocation.”

Getting all Mathy? Feedback Requested!

Do you want to hear more from “Stan”? He is ready to talk about adjusted R-squared, block-wise cross-validation, and data over-fitting. And why Debra’s analysis, while correct, was also incomplete. Please let me know if you are interested in hearing more on this topic.

Please let me know if I have made any math errors yet (other than the overtly deliberate ones).  I love to be corrected, because I want to make Sigma1 content as useful and accurate as possible.