Lost in the Endgame

Sherwood, Russell  Thursday, April 20, 2017

Is there a Good, Better, Best Engine in the Endgame?

When it comes to Engines the 2nd most common question asked is “Which Engine is the best in the X” - The “X” being the Opening, Middlegame or Endgame.

It's generally held that Engines are still relatively weak in the Endgame and this is demonstrated in a number of games where the evaluation is significantly incorrect (typically showing Win/Loss for a draw).

This raises four interesting points for the CC Player:

  1. Which kind of positions does the Engine typically misevaluate?

  2. Which Engines tend to be better in the endgame?

  3. Do any combinations of Engines improve the overall analysis?

  4. Can I leverage any of this knowledge in my own games?


 

Miss-evaluation

The most common errors tend to be positions, which when stripped down bare tend to be about Zugzwang or Fortresses. Miss-evaluation also tends to happen as engines tend to use generic methods which will generate “good” moves but not the absolute best move. In endgame analysis, the engine often relies on search depth finding the win rather than knowledge but often the required winning move is beyond its horizon.


 

Engines

My first Analysis was to use a very large epd file with 432 various endgame positions – some simple, some very difficult. The outcome of this analysis was showed two interesting outcomes:

(1) That even the best engines were only scoring just over 80% or more importantly were getting around 20% of positions incorrect. That is not to say that these moves were blunders, many were 2nd best but not optimal.

(2) Here there was very little difference with the Komodo (10.4) and Stockfish (in the form of Matefinder) being out I front, with Houdini 5 just behind – we then see a chasing pack with Deep Shredder, Fritz 15 and older versions of Komodo and Stockfish. Looking at some historical analysis

https://sites.google.com/site/computerschess/scct-ets-all

We can see that it appears Komodo, in the most recent versions, has made up ground on Stockfish and Houdini

What is interesting is that each version of the Big 3 Tends to advance the score by 1-2%, this being due to improving overall general evaluation picking up a few extra “correct” moves, rather than any additional knowledge being added to the engine.

This knowledge is what tends to make a difference in Endgame analysis – even a fairly weak human player will know about Bishops of Opposite colour being generally drawish but an engine which is a pawn ahead that does not have this built in, will trade down evaluating the game as won!

That knowledge is the key factor here is supported by running the tests again with significantly increased times – as expected the scores only increased slightly.

 

So I then moved onto a smaller 100 position test set with a view to looking at coverage.

So out of a score of 100 I obtained….

85 CorChess 1.2

84 Matefinder

82 Raubfisch ME262

81 Sugar Xpro 1.0

80 AsmFish

80 Houdini 1.5

79 Houdini 5

78 Komodo 10.4

74 DS 13

71 Fritz 15

71 Gull 3

68 Andscacs

65 Hiarcs 14

59 Hakkapeliitta

57 Critter 1.6

57 Chiron 3

41 Fire 4


 

I did run other, older engines through the test but in general the scores were at the bottom of the table I also ran other Stockfish Clones through the test suite but these simply crowded out the top of the table.

So from this, we see that even the best modern engines are missing around 15% of these positions.

Some might be surprised that Houdini 1.5 does so well. To a certain extent this is due to the way that Chess Engines are developed. What is more important these days is developing engines which beat other engines and sit atop of the rating list, this aligned with the self-test method utilised means that on occasion, to gain 50 wins, 1 loss is sacrificed and the “Baby goes out with the bathwater”.

Does using a range of engines improve accuracy?

Even from this small-scale test, the answer appears to be a clear Yes!.

The best single score is 85/100, but with a combination of Engines, we reach 99/100!


 

So Looking at a few combinations….

CorChess 1.2/Houdini 5/Komodo 10.4 we get 92/100


 

For 99/100 we need(!!)

MateFinder/Houdini 5/Fritz15/Sugar/DS13/Gull3 and Hakkapeliitta!!!!! Rather impractical


 

A rather more interesting choice is:

CorChess (or another Stockfish Clone) with Support from Houdini 1.5 and Hakkapeliitta with a score of 96/100


 

Can I leverage any of this knowledge in my own games?

The most simple takeaways from this brief analysis are:

(1) Engines do have gaps in their knowledge, which needs to be taken into account both in analysis and CC play.

(2) These knowledge gaps are not universal and using a combination of engines to analyse a position is likely to bring us closer to the “right” move.

(3) Not all older engines are obsolete and some do give new insights into a position.

(4) Even with the use of a combination of engines it still requires the Player to act as arbiter and when doing this s(he) should consider that the highest rated engine is not always correct.


 

This brings us to the end of this brief review of Engines in the Endgame. I include a spreadsheet with engine results should the reader be interested in different combinations.

Download

Updated Thursday, April 20, 2017 by Russell Sherwood

Welsh Correspondence Chess FederationBritish Correspondence Chess AssociationClergy Correspondence Chess ClubSchemingMind Internet Correspondence Chess ClubSocial Correspondence Chess AssociationNational Correspondence Chess ClubWelsh Chess UnionInternational Correspondence Chess Association