creating hash indexes

Rick Otten <rottenwindfish@xxxxxxxxx> · Wed, 14 Dec 2022 15:03:42 -0500

I inherited a database with several single-digit billion row tables.  Those tables have a varchar(36) column populated with uuids (all connected to each other via FKs) each currently supported by a btree index.
After the recent conversations about hash indexes I thought I'd do some comparisons to see if using a hash index could help and perhaps depriortize my burning desire to change the data type.  We never look up uuids with inequalities after all.  Indeed, in my test environments the hash index was half the size of the btree index, and the select performance was slightly faster than btree lookups. varchar(36) with hash index was roughly comparable to using a uuid data type (btree or hash index).

I was pretty excited until I tried to create the index on a table with the data (instead of creating it ahead of time and then loading up the test data).

Working in PG 14.5, on a tiny 9M row table, in an idle database, I found:
- creating the btree index on the varchar(36) column to consistently take 7 seconds
- creating the hash index on the varchar(36) to consistently take 1 hour

I was surprised at how dramatically slower it was.   I tried this on both partitioned and non-partitioned tables (with the same data set) and in both cases the timings came out similar.

I also tried creating a hash index on a varchar(100) column, also with 9M rows. I gave up after it did not complete after several hours.  (it wasn't locked, just slow)

While I was experimenting with the different index types, I did some insert tests.  After putting the hash index on the column, the inserts were significantly slower.  The btree index was 6-7x slower than no index, and the hash index was 100x slower than no index.

Assuming I can live with the slower inserts, is there any parameter in particular I can tweak that would make the time it takes to create the hash index closer to the btree index creation time?  In particular if I wanted to try this on a several billion row table in a busy database?

---

FWIW, from my tests on my laptop, on a 250M row table last weekend, after 100K selects:

MEAN (ms)  | btree    |  hash
---------  | -------  |  ----
varchar    | 28.14916 | 27.03769
uuid       | 27.04855 | 27.64424

and the sizes

SIZE    | btree    | hash
----    | -----    | ----
varchar | 12 GB    | 6212 MB
uuid    | 6595 MB  | 6212 MB

-  As long as the index fits in memory, varchar btree isn't really that much slower in postgresql 14 (the way it was a few years ago), so we'll probably just live with that for the forseeable future given the complexity of changing things at the moment.

--
Rick