On 1/24/15 9:03 AM, Adrian Klaver wrote:
On 01/23/2015 10:42 PM, Ravi Kiran wrote:
hi,
I want to know what kind of hash function postgresql uses while joining.
I was debugging through gdb, I found out that it is not using bob
jenkins hash function but a different hash function *hash_uint32() and
hash_any() *functions if the joining attribute is an integer, and a
different kind of hash function for a different type of joining
attribute.
I want to know whether the hash functions will change if the number of
tuples in the table are very large or very low, and if it changes,
please tell me what hash function it uses if the tuples are very large
and what hash function it uses if the number is very low, also I came to
know that the hash function will change depending on the type of the
attribute on which the join takes place, but will it always remains the
same for the integer type of joining attribute or will it it change.
Seems a good starting point is here:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;h=6e44d337d3dc3e8d5439e2dbbc54ce0c33d3b5d4;hb=8ca336f4ac3f08a5f23e76c6e9a5f2c8064f5883
http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;hb=HEAD
is even better, since that's HEAD.
To answer your question, AFAIK there's no consideration made for the
number of tuples, and I rather doubt that would help much. If you want
to go mucking around with this I think it'd be more useful to look at
using a different hash algorithm for types that could be significant in
size (namely varchar, text and bytea). There are algorithms newer than
Jenkins that look to be significantly faster on larger volumes of data.
You might find https://github.com/decibel/hash_testing useful; it's way
to test hash performance of several hashes.
https://github.com/decibel/postgres/tree/xxhash is where I pulled xxhash
into Postgres to try using it for hashing shared buffer identifiers. It
didn't help there, but it might be faster than Jenkins for things like
hashing text fields.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general