How Should We Use Chess Engines?
Reflections on watching tournament chess (adapted from the Caruana-Carlsen match)
Around the time of the World Chess Championship, the amount of interest in chess usually goes far beyond tournament players. Of course, this is a good thing. Free internet broadcasts have my most casual chess-playing friends following along and pinging me with questions.
However, with how powerful the average individual's laptop is nowadays, the strongest chess computers in the world are also available to everyone. On one hand, this allows the average player (and above-average ones too!) to get a general sense of how the game is going. On the other hand, I see drastic differences in how people watch, react, and even commentate on games now.
First, we take a look at the media.
As someone with a background in math, a lot of my current and past hobbies and work revolve around calculating odds and working with data to provide insights on markets, sports, games, and more. It was interesting to see the mainstream “sports analytics” blogs pick up coverage of the Caruana-Carlsen match from a few years ago, where a senior writer was assigned to cover each game. I noticed a glaring flaw in their article narrative, though:
Running on my laptop, Stockfish, the powerful chess engine, assesses black — Caruana, in this case — with about a half-pawn disadvantage after the first two moves. Nevertheless, Caruana won the game.
To me, this shows a fundamental misunderstanding of what a computer evaluation is telling us, and reduces each position to a number. In fact, a computer evaluation in any opening is mostly meaningless other than to indicate clear blunders or clearly inferior openings. Inherently, White has an advantage — they get to move first. As you'll notice, computers work this out to about +.3-.7 across the board, depending on what line has been chosen to play. However, this is by no means indicative of the actual strength of the position! So much of the board and play is undecided such that drawing conclusions from a number where a position hasn't even been fully fleshed out is ridiculous.
While this is obviously a small example, this mentality spills over to the broader spectating community. Most of the chat streams I see are filled with users spouting the engine evaluation as gospel truth and using it as a crutch to say "Ian blundered" or "Ding blundered" when the number evaluation moves by less than .5. While most stronger GMs doing commentary tend to stray away from using the engine to assess positions, some of the 'amateur-friendly' streams have this nasty habit of solely going over engine lines and attempting to interpret the computer rather than assessing the position with their own human perspective, which brings me to my second point: computer chess is not human chess.
Undoubtedly, computers are better than humans at chess. The computing boom has created engines that are nigh-unbeatable, and humans aren't ever going to catch up skill-wise. Thus, it's absolutely necessary to run your preparation through a computer. How exactly are we supposed to interpret what the computer is saying, though?
There are plenty of top-level games I've been watching where the computer is suggesting one move but the player has clearly been planning on going another direction. It's almost impossible to understand why computers play the way they do in certain positions, which makes players like Magnus, who can tap into a computer-like instinct in some positions to find ridiculous ideas, such a pleasure to watch.
With the development of AlphaZero/Leela Zero and deep learning engines, these spheres of 'computer chess' and 'human chess' have merged to some extent, and we get extravagant preparation combined with human intuition to follow the game to its best conclusion. A lot of the time, however, it's completely overlooked that computer chess and human chess are two separate things entirely. A computer, in essence, is based off of pure calculation. A lot of the modern ones have opening books built in and endgame tablebases worked out, but the raw function of evaluating a position starts with narrowing possibilities of 'good moves' and calculating them out as far as their processor will let them. At the end of the day, it's a "compare, search, and sort" brute force search tree problem. Humans, on the other hand, don't follow this method at all. Sure, there is some calculation, and sure, we have to find a way to narrow down our candidate moves, but no strong chess player can succeed without some level of intuitive ability — that sense that, for some reason, this move just looks right. Human skill in chess is reflected in pattern recognition, quickly synthesizing what to focus on in a position before calculating. No search function can replicate that hunch, and no amount of human calculating ability can match the computer's. The halves of computer and human that make up chess are, for the time being, essentially lobotomized. The question then becomes “how should we use chess computers?”
To stop myself from sounding like too much of a chess Luddite, I want to make it clear that I'm not advocating that we stop watching games with engines up or stop using them to go over our games. But I do want to advocate for a healthier use of engines rather than the near-mindless acceptance that the evaluation is everything. Most stronger players I know that follow and play chess seriously tend to use an engine evaluation as nothing more than a relative strength gauge. Statements like 'White has an advantage of +.7" become "White can press in this position", or "Black needs to play accurately to hold" and lead to a more organic search for ideas. Sure, sometimes a positional evaluation becomes clearly winning contingent on one side finding a move. However, though it is revealed to the audience, missing a win isn't inherently a 'blunder', as sometimes it's not humanly possible to see things in given situations — there’s a practical element to playing chess in that you generally don’t blindly leap into complications without having an inkling of where it goes. Engines have no context of stakes, time pressure, positional complexity, or mental state — they can only spit out what their algorithms have evaluated to be the best. As such, I avoid using engines as much as possible when observing games and most strong players I know prefer to watch chess in this manner as well. Engine evaluations can be deceptive — sometimes winning is just about having an easier position to play than your opponent and them being unable to maintain it. An evaluation might call a position equal, but as we saw in Game 4 of the current match, would a single human player prefer to play Black over White here?
Engine analysis and human comfort are pretty unrelated concepts at times. There are many, many all-time great players who are totally comfortable with “unsound” sacrifices and engine disadvantages because it’s just so hard to play the other side under realistic conditions. This is lost on the computer prep generation, but it’s very much how I buttered my bread when I still played competitively. Unfortunately it’s not possible to be competitive without playing computer moves at this point — that’s a large reason as to why I don’t play chess competitively anymore — but you can preserve some of the joy of humanity by not blindly following Sesse.