Feeling Lucky? The Prequel (aka the Back Testing Problem)


A couple of years ago, I took the 14-hour Qantas flight from Los Angeles to Sydney and back.

For a slow reader like me, 14 hours is the perfect time to polish off a book from cover to cover. On the way to Sydney I read Thomas Schelling's Micromotives and Macrobehavior. The hair on the back of my neck stood up; I got goosebumps!

Schelling's analysis is so clean and simple, yet packs a tremendous conceptual wallop. I was instantly reminded of Einstein's Investigations on the Theory of Brownian Movement and Feynman's QED. These books make each step seem so simple, yet when you string them all together, the result is substantial, deep, and beautiful.

Schelling's book took me back to the first time I laid eyes on Euclid's axioms. Before I knew it, we had landed and I was in a long taxi line that was winding it's way through a temporary building outside the terminal.

On my way back to Los Angeles, I read Nassim Taleb's Fooled by Randomness. It's a relatively slim book. But it kept me occupied the whole flight. In fact, it got me angry. I was cursing out loud! (I come from the How Late It Was, How Late school of cursing.)

I think it was Bertrand Russell who wrote this one-line book review: "I should think the covers of this book are too far apart." That summed it up for me. Fooled by Randomness could have been a great book. Instead the tragically hip pseudo intellectual poseur shtick beat the life out of it.

Here's what it could have been.

A Precis of Fooled by Randomness

  1. Any data set can be “explained” by an infinite number of patterns (Choice and Chance, by Brian Skyrms). [This has hugely interesting implications -- more on this at the end]

  2. Use Monte Carlo analysis to uncover the role of randomness:

    • Rule out alternative explanations (ones that have very low probability)

    • Determine the relative strength of the current explanation

    • Find interdependencies of variables in certain permutations

    • Find missing variables

  3. Interesting results from probability theory that should inform your reasoning:

    1. Dependence of probability on time scale

    2. Counterintuitive results due to skewness and asymmetry

  4. Common mistakes in probabilistic inference (the best summary of results is Scott Plous' The Psychology of Judgement and Decision Making)

Main Point: Even the most savvy Wall Street traders make huge mistakes when they reason about statistics. In particular, they grossly underestimate the role of sheer luck in the results they achieve.

Now that would have been a great book! Not wholly original, but highly informative.

To give you a sense of what this randomness is like (and how Monte Carlo analysis can illuminate it), here is a quote from LeonardMlodinow ( The Triumph of the Random by , WSJ July 3 - 5, 2009 Page W1)

"A few years ago Bill Miller of the Legg Mason Value Trust Fund was the most celebrated fund manager on Wall Street because his fund outperformed the broad market for 15 years straight. It was a feat compared regularly to DiMaggio’s, but if all the comparable fund managers over the past 40 years had been doing nothing but flipping coins, the chances are 75% that one of them would have matched or exceeded Mr. Miller’s streak. If Mr. Miller was really merely the goddess of Fortune’s lucky beneficiary, then one would expect that once the streak ended there would be no carryover of his apparent golden touch. In that expectation Mr. Miller did not disappoint: in recent years his fund has significantly lagged the market as he bet on duds like AIG, Bear Stearns, Merrill Lynch & Co. and Freddie Mac."

So how lucky are you? To find out, read the sequel.

Post Script (Added July 30, 2009)

An excellent example of how a data set can be "explained" by any old hypothesis appeared in the Wall Street Journal earlier this month (Data Mining Isn't a Good Bet for Stock-Market Predictions by Jason Zweig).

Here are some excerpts from the article that make the point (my emphasis) -- enjoy!

An entertaining new book, "Nerds on Wall Street," by the veteran quantitative money manager David Leinweber, dissects the shoddy thinking that underlies most of these techniques.

The stock market generates such vast quantities of information that, if you plow through enough of it for long enough, you can always find some relationship that appears to generate spectacular returns -- by coincidence alone. This sham is known as "data mining."

Every year, billions of dollars pour into data-mined investing strategies. No one knows if these techniques will work in the real world. Their results are hypothetical -- based on "back-testing," or a simulation of what would have happened if the manager had actually used these techniques in the past, typically without incurring any fees, trading costs or taxes....

Mr. Leinweber got so frustrated by "irresponsible" data mining that he decided to satirize it. After casting about to find a statistic so absurd that no sensible person could possibly believe it could forecast U.S. stock prices, Mr. Leinweber settled on annual butter production in Bangladesh. Over an 13-year period, he found, this statistic "explained" 75% of the variation in the annual returns of the Standard & Poor's 500-stock index.

By tossing in U.S. cheese production and the total population of sheep in both Bangladesh and the U.S., Mr. Leinweber was able to "predict" past U.S. stock returns with 99% accuracy.

The Problem of Induction is the problem of demonstrating that the future will be like the past. It can't be done via deductive argument (because by definition, induction takes a leap beyond what is contained in the premises), and it can't be done via induction because that would be circular.

But even if you could solve the problem of induction and demonstrate that the future will resemble the past, it won't help solve the back-testing problem. To solve that, you have to be able to show in what ways the future will resemble the past. There is a super-interesting problem lurking here that goes by the name of the Grue Paradox.

To summarize: back-testing alone is insufficient to demonstrate the validity of a hypothesis. It's insufficient because I can cook up an infinite number of hypothesis that will fit the pattern in question. I need a way sort out good hypotheses from bad.Without that "goodness filter" I don't know if the "pattern" identified will repeat in the future.

Filed in: Software Quality
Get the Pulse Newsletter  Sign up for the latest Software Intelligence news Subscribe Now <>
Open source is part of almost every software capability we use today. At the  very least libraries, frameworks or databases that get used in mission critical  IT systems. In some cases entire systems being build on top of open source  foundations. Since we have been benchmarking IT software for years, we thought  we would set our sights on some of the most commonly used open source software  (OSS) projects. Software Intelligence Report <> Papers
In our 29-criteria evaluation of the static application security testing (SAST)  market, we identified the 10 most significant vendors — CAST, CA Veracode,  Checkmarx, IBM, Micro Focus, Parasoft, Rogue Wave Software, SiteLock,  SonarSource, and Synopsys — and researched, analyzed, and scored them. This  report shows how each measures up and helps security professionals make the  right choice. Forrester Wave: Static Application Security Testing, Q4 2017  Analyst Paper
This study by CAST reveals potential reasons for poor software quality that  puts businesses at risk, including clashes with management and little  understanding of system architecture. What Motivates Today’s Top Performing  Developers Survey
Load more reviews
Thank you for the review! Your review must be approved first
New code

You've already submitted a review for this item