On 11/04/12 19:15, Sidney Cadot wrote:
Dear all, As a hobby project, I am toying around with a database containing about 5 million chess games. On average, these games have about 80 positions (~ 40 moves by both black and white), which means there are about 400 million chess positions in there. I have written code to extract these positions, and now I want to put them into a Postgres database. Specifically, I want to do this in a way that allows *fast* lookups of positions, e.g. "give me all positions that have a White King on c4 and either a Black Bishop or White Knight on f7". Currently, my "Positions" table looks like this: Column | Type | Modifiers -------------------+---------+----------- gameindex | integer | not null plyindex | integer | not null pseudofenboard | text | not null fenside | text | not null fencastling | text | not null fenenpassant | text | not null possiblemovecount | integer | not null isincheck | boolean | not null Indexes: "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) Foreign-key constraints: "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES games(gameindex) The "PseudoFenBoard" field currently holds a string describing the position. For example, the starting position of chess looks like this: "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" This design allows me to formulate the kind of positional queries that I want (by using regular _expression_ matching), but executing them will involve a slow, linear traversal of the 400M table rows, which is not desirable. I am toying around with the ugly idea to make a "Positions" table that has a single field for each of the squares, e.g. CREATE TABLE Position2 ( GameIndex INTEGER NOT NULL, PlyIndex INTEGER NOT NULL, a1 "char" NOT NULL, a2 "char" NOT NULL, -- (60 fields defs omitted) h7 "char" NOT NULL, h8 "char" NOT NULL ); This would allow the creation of indices on each of the 64 fields separately, which should help to achieve near-instantaneous position query performance, especially after gathering proper statistics for all the field-specific indices. I realize that this design is quite ugly, so I would be interested to hear if there are nicer alternatives that can perform equally well. Also, above I use the 1-byte "char" type. Is this the only type in PostGres that is guaranteed to be just a single byte, or are there better alternatives? A 13-state enum would be best (listing the 6 white pieces, 6 black pieces, and 'empty' states for every square on the board) but as I understand from the documentation, enums always up take 4 bytes per entry. Any ideas for improvement would be greatly appreciated. How aboutsomething
like the following (game and postion would have more fields in
practice, like comments and where played)?
DROP TABLE IF EXISTS game CASCADE;
CREATE
TABLE game
CREATE
TABLE position
CREATE
TABLE piece
CREATE UNIQUE INDEX square ON piece (rank, file, type, white);
SELECT
Cheers, |