On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras <ivoras@xxxxxxxxx> wrote: > > I have an application which stores a large amounts of hex-encoded hash > strings (nearly 100 GB of them), which means: > > The number of distinct characters (alphabet) is limited to 16 > Each string is of the same length, 64 characters > The strings are essentially random > > Creating a B-Tree index on this results in the index size being larger than > the table itself, and there are disk space constraints. > > I've found the SP-GIST radix tree index, and thought it could be a good > match for the data because of the above constraints. An attempt to create it > (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more > than 12 hours (while a similar B-tree index takes a few hours at most), so > I've interrupted it because "it probably is not going to finish in a > reasonable time". Some slides I found on the spgist index allude that both > build time and size are not really suitable for this purpose. I've found that hash btree indexes tend to perform well in these situations: CREATE INDEX ON t USING btree (hashtext(fieldname)); However, you'll have to modify your queries to query for both, the hashtext and the text itself: SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND fieldname = 'blabla'; -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance