Re: Hash index use presently(?) discouraged since 2005: revive or bury it?

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 15 Sep 2011 23:20:43 -0500

On Thu, Sep 15, 2011 at 8:00 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
> On Thu, Sep 15, 2011 at 5:38 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>> Merlin Moncure <mmoncure@xxxxxxxxx> writes:
>>> HM, what if you junked the current hash indexam, and just implemented
>>> a wrapper over btree so that the 'hash index' was just short hand for
>>> hashing the value into a standard index?
>>
>> Surely creating such a wrapper would be *more* work than adding WAL
>> support to the hash AM.
>>
>> I'm not entirely following this eagerness to junk that AM, anyway.
>> We've put a lot of sweat into it over the years, in the hopes that
>> it would eventually be good for something.  It's on the edge of
>> being good for something now, and there's doubtless room for more
>> improvements, so why are the knives out?
>
> Just making an observation.  Some quick tests follow the sig.  I think
> the point here is that something has to be done -- now that the
> replication train has left the station, not having WAL has gone from
> quirky annoyance to major functionality failure.  The recent hash work
> has brought down index build times to a reasonable level, but they are
> still getting beat by btree.  Of course, it's not quite apples to
> apples (I figure the timings will even out to an extent once you add
> in the hashing wrapper), but I can't help but wonder if the btree code
> is a better driver and consolidating code is a good thing.

odd: I was pondering Claudio's point about maintenance of hash indexes
vs btree and decided to do some more tests.  Something very strange is
happening:  I decided to compare 'update v set x=x+1', historically
one of postgres's weaker points, on the 10M table indexed hash vs
btree.  The btree typically muddled through in about 5 minutes:

postgres=# update v set x=x+1;
UPDATE 10000000
Time: 302341.466 ms

recreating the table and hash index, I ran it again. 47 minutes into
the query, I started to get curious and noticed that cpu time disk
usage are hovering near zero but nothing is blocked. disk space on the
index is *slowly* increasing, now at:
09/15/2011  11:08 PM       541,024,256 16531

this is obviously, uh, windows, and I don't have good tools set up.
I'll repeat the test when i get into the office this morning.  thought
I'd point it out.  hm, cancelled the query, dropped the index, and
re-ran the update without any indexes, everything is normal -- the
update whistled through the table in about 35 seconds.

hm, recreated the hash index and looking more carefully now.  still
seeing the lousy behavior.  this definitely bears more investigation
(this is 9.0.4)...

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance