On Wed, Jun 15, 2016 at 03:09:04PM +0200, Ivan Voras wrote: > On 15 June 2016 at 15:03, ktm@xxxxxxxx <ktm@xxxxxxxx> wrote: > > > On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: > > > Hi, > > > > > > I have an application which stores a large amounts of hex-encoded hash > > > strings (nearly 100 GB of them), which means: > > > > > > - The number of distinct characters (alphabet) is limited to 16 > > > - Each string is of the same length, 64 characters > > > - The strings are essentially random > > > > > > Creating a B-Tree index on this results in the index size being larger > > than > > > the table itself, and there are disk space constraints. > > > > > > I've found the SP-GIST radix tree index, and thought it could be a good > > > match for the data because of the above constraints. An attempt to create > > > it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes > > more > > > than 12 hours (while a similar B-tree index takes a few hours at most), > > so > > > I've interrupted it because "it probably is not going to finish in a > > > reasonable time". Some slides I found on the spgist index allude that > > both > > > build time and size are not really suitable for this purpose. > > > > > > My question is: what would be the most size-efficient index for this > > > situation? > > > > Hi Ivan, > > > > If the strings are really random, then maybe a function index on the first > > 4, 8, or 16 characters could be used to narrow the search space and not > > need > > to index all 64. If they are not "good" random numbers, you could use a > > hash > > index on the strings. It will be much smaller since it currently uses a > > 32-bit > > hash. It has a number of caveats and is not currently crash-safe, but it > > seems > > like it might work in your environment. You can also use a functional > > index on > > a hash-function applied to your values with a btree to give you crash > > safety. > > > > > Hi, > > I figured the hash index might be helpful and I've tried it in the > meantime: on one of the smaller tables (which is 51 GB in size), a btree > index is 32 GB, while the hash index is 22 GB (so btree is around 45% > larger). > > I don't suppose there's an effort in progress to make hash indexes use WAL? > :D Hi Ivan, Several people have looked at it but it has not made it to the top of anyone's to-do list. So if you need WAL and crash-safety, a functional index on a hash of your values is currently your best bet. Regards, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance