On 11 April 2012 09:15, Sidney Cadot <sidney@xxxxxxxxx> wrote: > Dear all, > > As a hobby project, I am toying around with a database containing > about 5 million chess games. On average, these games have about 80 > positions (~ 40 moves by both black and white), which means there are > about 400 million chess positions in there. > > I have written code to extract these positions, and now I want to put > them into a Postgres database. Specifically, I want to do this in a > way that allows *fast* lookups of positions, e.g. "give me all > positions that have a White King on c4 and either a Black Bishop or > White Knight on f7". > > Currently, my "Positions" table looks like this: > > Column | Type | Modifiers > -------------------+---------+----------- > gameindex | integer | not null > plyindex | integer | not null > pseudofenboard | text | not null > fenside | text | not null > fencastling | text | not null > fenenpassant | text | not null > possiblemovecount | integer | not null > isincheck | boolean | not null > Indexes: > "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) > Foreign-key constraints: > "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES > games(gameindex) > > The "PseudoFenBoard" field currently holds a string describing the > position. For example, the starting position of chess looks like this: > > "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" > > This design allows me to formulate the kind of positional queries that > I want (by using regular expression matching), but executing them will > involve a slow, linear traversal of the 400M table rows, which is not > desirable. > > I am toying around with the ugly idea to make a "Positions" table that > has a single field for each of the squares, e.g. > > CREATE TABLE Position2 ( > GameIndex INTEGER NOT NULL, > PlyIndex INTEGER NOT NULL, > a1 "char" NOT NULL, > a2 "char" NOT NULL, > -- (60 fields defs omitted) > h7 "char" NOT NULL, > h8 "char" NOT NULL > ); > > This would allow the creation of indices on each of the 64 fields > separately, which should help to achieve near-instantaneous position > query performance, especially after gathering proper statistics for > all the field-specific indices. > > I realize that this design is quite ugly, so I would be interested to > hear if there are nicer alternatives that can perform equally well. > > Also, above I use the 1-byte "char" type. Is this the only type in > PostGres that is guaranteed to be just a single byte, or are there > better alternatives? No, you're using the unlimited length "char" type. You probably mean to use either char(1) or varchar(1). It wouldn't hurt to make those chars a foreign key to a chess-piece table, so that you both define what a certain code means on the chess-board and constrain which codes are available. You could possibly take that a bit further by finding a constraint that limits the number of each piece on the board (exactly 1 white king, up to 2 white towers, up to 8 white pawns, etc), but I have no ready idea of how you could achieve that... > A 13-state enum would be best (listing the 6 > white pieces, 6 black pieces, and 'empty' states for every square on > the board) but as I understand from the documentation, enums always up > take 4 bytes per entry. > Any ideas for improvement would be greatly appreciated. Well, since a chess-board is a matrix, it seems to make sense to describe it as a two-dimensional array of varchar(1)'s. I don't know off the top of my head what the storage requirements for that are though. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general