Note: this is based off of Malt Liquidity 33 (2/19/21), The Data Valuation Paradox (5/5/24), and My System (8/26/23).
Previous editions of the TWT newsletter can be found here (8-6-24) and here (8-8-24). You can get these newsletters (and some extra goodies) on the night of by signing up for free below:
On: Systematic Trading, Position Sizing, Statistics
A core goal of trading in the long run is to “systematize” what you are doing. Certainly, we are all familiar with the idea of backtesting, Sharpe ratios, and waiting for signals to fire in profitable setups. But how exactly are we thinking about profit over time, or “using” our metrics? Consider the case of IBM’s Watson — a pre-OpenAI mania project, IBM intended to turn Watson into an AI-doctor to help doctors out and speed up the healthcare process. Watson Health, however, was a big failure; what gives?
Well, Watson’s failures seemed like a classic case of chucking refine-through-error iterative statistics and gobsmacking amounts of data at a complex problem without rigid inputs or a recognizable output. It wasn’t that the product wasn’t good, but that IBM intended to use Watson to optimize the accuracy of a diagnosis. Does this really make sense as a value-add point? Diagnosis isn’t a binary option, or an all-or-nothing process. Doctors have chances to fix erroneous initial diagnoses in follow-ups, and patients have the right to get a second opinion (and often do.) Similarly, I see the same issue frequently arise when people are trying to assess the efficacy of their strategies — rather than looking and seeing why their strategy works, they’ll rely on all sorts of metrics and long-run assessors that fail to grasp the distinction that statistics don’t apply to individuals in a population. Anything can happen in an n=1 scenario — the singular outcome does not indict or validate the strat as a whole.
From “The Data Valuation Paradox”:
The divergence between math and humans arises due to intent, methodology, and scalability. Applied probabilistic and statistical methods are just processes that can be optimally applied to certain problems. Which method you apply depends on factors such as the level of precision desired, availability, amount, and cleanliness of data, and how fast you want the result. Humans apply this type of thinking contextually. Beyond specialized activities, this usually means that these capabilities are only used when relevant to the individual. Humans are also creatures of evolution — these capacities were developed many millennia before the advent of computing. Here lies the crux of the confusion: human thinking was never meant to scale with the amount of data available. Rather, human intuition was honed on balancing accuracy relative to the available information and the speed with which the estimate needed to be made. Human ability is unparalleled when an approximation needs to be made on very little data, which is why casino games trap so many people. Humans simply cannot process the “long run” without putting it in the context of themselves without training the instinct out, which is why you can find endless amounts of people convinced that they’ve “solved” roulette that hawk “their strategies”.
Stock market data is pretty much the noisiest dataset out there because each price tick could concurrently be noise and signal depending on your filtration method. (Hence, beware of unfalsifiable predictions, they’re made by false prophets.) To trade properly, our trades need to “make sense” in our own heads, from a purview beyond validation through metrics. How I do this is by taking a “green to tee” approach (to borrow from Tiger Woods’ golf methods) when I develop strategies:
Exploratory data analysis is fine, but don’t just number-crunch to try and “find edge”. Correlations exist everywhere in finance — every product is tied together to some degree — so raw numbers won’t provide insight without contextualization as to why that correlation is significant.
Once you have found potential edge, find the optimal case for your trade: “where does my edge work best?” This is the “green-to-tee” philosophy, as from there, you find the optimal conditions, find what breaks as you iterate out of the optimal scenario, and slowly widen the thresholds. (A good example of this is that signals tend to be much noisier on SPY than ES, while the opposite is true for NQ and QQQ due to the relative value in underlying liquidity vs fluid tick size.) Start with the simplest version of your idea — the tap-in putt — and then add all the nonsense on top, as every step of math or additional product you add increases the potential noise exponentially.
This is why pure black boxes tend to backtest much better than they perform live. When you don’t know what the output is supposed to look like for a given strategy is, (Sharpe supposed to be optimized? Benchmarked to outperforming the S&P? High expectancy, low trade frequency?) how could you possibly account for overfitting/overreading into signals and avoid messing up live execution? The noise iteratively gets worse and worse and your trading signals become more and more overfit the more we try and rationalize mathematical output in the human context.
Consequently, the two most frequent mistakes I see when trying to trade a system are oversizing and overtrading. You don’t need to have a precise mathematical understanding of the Kelly Criterion to understand that if you size to the point such that one losing trade ruins your ability to make back the losses in the long run, you’re trading too much size. On the flip side, if we’re treating all trades the same, then we’re not taking advantage of the best, most likely setups. So, some sizing criterion on “good” setups and “OK” setups is worth “systematizing”. (You can refer back to last Thursday’s newsletter for some tips on adjusting exposure and stops). This is essentially the logic behind why counting cards works — you want to bet more when the count is good.
Overtrading, on the other hand, is more psychological — it’s a function of boredom. While the selection bias in this article is atrocious, it does accidentally make a good point in that most of the S&P returns come from a handful of large trading days a year. Similarly, unless you are market-making and capturing microscopic edge day in day out, it’s likely that a bulk of our yearly gains will come from a handful of trading days. So why do we trade every day? Well, for the obvious reason that we don’t know when the best days will come: most of why I write, read, and otherwise pontificate is because I’m waiting for the opportunities to come about. The second reason is that, while each trade is not going to be a home run, consistently building up a bankroll allows us to compound and bet more (you can tell that’s a favorite phrase of mine) without introducing a serious risk of ruin on amazing trade setups. Being great at trading, in a sense, is being alright with being bored. The more you stare at charts while bored, the more likely you are to see something that isn’t there. This is a bit of systematization that doesn’t require any math at all — just don’t take the null EV trades and allow the long-term compounding to take its effect. If you want to spin the roulette wheel, do it where you get comps, not with the market.