On Mon, Sep 19, 2011 at 8:19 AM, Robert Klemme <shortcutter@xxxxxxxxxxxxxx> wrote: > On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: > >> The other way to go of course is to try and fix up the existing hash >> index code -- add wal logging, etc. In theory, a customized hash >> structure should be able to beat btree all day long which argues to >> continue in this direction. > > I still haven't seen a solution to locking when a hash table needs > resizing. All hashing algorithms I can think of at the moment would > require a lock on the whole beast during the resize which makes this > type of index impractical for certain loads (heavy updating). The current implementation doesn't EX lock the whole beast during resizing, except a brief one at the beginning (and maybe at the end?) of the split. It does EX lock the entire bucket being split for the duration of the split, though. The main problem that I see is that due to the potential for deadlocks, the locks involved have to be heavy-weight. Which means the shared locks which non-splitting processes have to use to block against those EX locks have to be heavy-weight too, and getting even shared heavy-weight locks means exclusive light-weight locks, which means contention. One way out would be to have a special process (probably vacuum) do all the resizing/splitting, rather than having regular backends doing it. It should be possible to make this dead-lock free. Another would be to make hash indexes only support bit-map scans. Then the scanning hash-code would never have a need to pass control back to the executor while still holding a bucket lock. Another would be to make the current position of the hash index scan be "refindable" if a split should occur during the scan, so that you don't need to inhibit splits during a scan of the same bucket. This would probably be easy if there were no overflow pages. But the overflow pages get shuffled in with each other and with the main bucket page during a split. It would take quite some gymnastics to get around that. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance