On 11/04/12 21:24, Gavin Flower wrote:
On 11/04/12 19:15, Sidney Cadot wrote:There was a blatantly obvious flaw in the above query: the pices checked, should belong to the same position!Dear all, As a hobby project, I am toying around with a database containing about 5 million chess games. On average, these games have about 80 positions (~ 40 moves by both black and white), which means there are about 400 million chess positions in there. I have written code to extract these positions, and now I want to put them into a Postgres database. Specifically, I want to do this in a way that allows *fast* lookups of positions, e.g. "give me all positions that have a White King on c4 and either a Black Bishop or White Knight on f7". Currently, my "Positions" table looks like this: Column | Type | Modifiers -------------------+---------+----------- gameindex | integer | not null plyindex | integer | not null pseudofenboard | text | not null fenside | text | not null fencastling | text | not null fenenpassant | text | not null possiblemovecount | integer | not null isincheck | boolean | not null Indexes: "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) Foreign-key constraints: "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES games(gameindex) The "PseudoFenBoard" field currently holds a string describing the position. For example, the starting position of chess looks like this: "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" This design allows me to formulate the kind of positional queries that I want (by using regular _expression_ matching), but executing them will involve a slow, linear traversal of the 400M table rows, which is not desirable. I am toying around with the ugly idea to make a "Positions" table that has a single field for each of the squares, e.g. CREATE TABLE Position2 ( GameIndex INTEGER NOT NULL, PlyIndex INTEGER NOT NULL, a1 "char" NOT NULL, a2 "char" NOT NULL, -- (60 fields defs omitted) h7 "char" NOT NULL, h8 "char" NOT NULL ); This would allow the creation of indices on each of the 64 fields separately, which should help to achieve near-instantaneous position query performance, especially after gathering proper statistics for all the field-specific indices. I realize that this design is quite ugly, so I would be interested to hear if there are nicer alternatives that can perform equally well. Also, above I use the 1-byte "char" type. Is this the only type in PostGres that is guaranteed to be just a single byte, or are there better alternatives? A 13-state enum would be best (listing the 6 white pieces, 6 black pieces, and 'empty' states for every square on the board) but as I understand from the documentation, enums always up take 4 bytes per entry. Any ideas for improvement would be greatly appreciated. That I only discovered the flaw when I mentally checked the SQL on the way to work the folowing day. The following, hopefully, fixes the problem SELECT p1.position_id FROM piece AS p1 JOIN piece AS p2 USING (position_id) WHERE ( p1.white AND p1.type = 'K' AND p1.file = 'c' AND p1.rank = '4' ) AND ( ((NOT p2.white AND p2.type = 'B') OR (p2.white AND p2.type = 'K')) AND p2.file = 'f' AND p2.rank = '7' ); |