Re: How to make hash indexes fast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 18, 2011 at 6:14 PM, Craig James <craig_james@xxxxxxxxxxxxxx> wrote:
> Regarding the recent discussion about hash versus B-trees: Here is a trick I
> invented years ago to make hash indexes REALLY fast.  It eliminates the need
> for a second disk access to check the data in almost all cases, at the cost
> of an additional 32-bit integer in the hash-table data structure.

I don't see how that can work unless almost all of your queries return
zero rows.  If it returns rows, you have to go get those rows in order
to return them.  And you also have to go get them to determine if they
are currently visible to the current transaction.  This sounds like a
highly specialized data structure for a highly specialized situation.

> Using this technique, we were able to load a hash-indexed database with data
> transfer rates that matched a cp (copy) command of the same data on Solaris,
> HP-UX and IBM AIX systems.
>
> You build a normal hash table with hash-collision chains.  But you add
> another 32-bit integer "signature" field to the hash-collision record (call
> it "DBsig").  You also create a function:
>
>  signature = sig(key)
>
> that produces digital signature.  The critical factor in the sig() function
> is that there is an average of 9 bits set (i.e. it is somewhat "sparse" on
> bits).
>
> DBsig for a hash-collision chain is always the bitwise OR of every record in
> that hash-collision chain.  When you add a record to the hash table, you do
> a bitwise OR of its signature into the existing DBsig.  If you delete a
> record, you erase DBsig and rebuild it by recomputing the signatures of each
> record in the hash-collision chain and ORing them together again.

Since PG doesn't store the keys in the hash index, this would mean
visiting the table for every entry with the same hash code.

>
> That means that for any key K, if K is actually on the disk, then all of the
> bits of sig(K) are always set in the hash-table record's "DBsig".  If any
> one bit in sig(K) isn't set in "DBsig", then K is not in the database and
> you don't have to do a disk access to verify it.  More formally, if
>
>   sig(K) AND DBsig != sig(K)
>
> then K is definitely not in the database.
>
> A typical hash table implementation might operate with a hash table that's
> 50-90% full, which means that the majority of accesses to a hash index will
> return a record and require a disk access to check whether the key K is
> actually in the database.

PG hash indexes do not typically operate at anywhere near that full,
because PG stores the entire 32 bit hash value.  Even if there are
only 8 buckets, and so only the bottom 3 bits are used to identify the
bucket, once in the bucket all 32 bits are inspected for collisions.
So on tables with less than several hundred million, collisions are
rare except for identical keys or malicious keys.  And if you want
tables much larger than that, you should probably go whole hog and
switch over to 64 bit.

So the fullness of the hash-value space and the fullness of the actual
table are two different things in PG.


> With the signature method, you can eliminate over
> 99.9% of these disk accesses -- you only have to access the data when you
> actually want to read or update it.  The hash table can usually fit easily
> in memory even for large tables, so it is blazingly fast.
>
> Furthermore, performance degrades gracefully as the hash table becomes
> overloaded.  Since each signature has 9 bits set, you can typically have
> 5-10 hash collisions (a lot of signatures ORed together in each record's
> DBsig) before the false-positive rate of the signature test gets too high.

But why not just distribute those 32/(5 to 10) bits to the ordinary
hash space, increasing them from 32 to 35 bits, rather than creating a
separate hash space?  Doesn't that get you the same resolving power?

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux