Search Postgresql Archives

Re: hash function in Postgres

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/24/15 9:03 AM, Adrian Klaver wrote:
On 01/23/2015 10:42 PM, Ravi Kiran wrote:
hi,


I want to know what kind of hash function postgresql uses while joining.
I was debugging through gdb, I found out that it is not using bob
jenkins hash function but a different hash function *hash_uint32() and
hash_any() *functions if the joining attribute is an integer, and a
different kind of hash function for a different type of joining
attribute.

I want to know whether the hash functions will change if the number of
tuples in the table are very large or very low, and if it changes,
please tell me what hash function it uses if the tuples are very large
and what hash function it uses if the number is very low, also I came to
know that the hash function will change depending on the type of the
attribute on which the join takes place, but will it always remains the
same for the integer type of joining attribute or will it it change.

Seems a good starting point is here:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;h=6e44d337d3dc3e8d5439e2dbbc54ce0c33d3b5d4;hb=8ca336f4ac3f08a5f23e76c6e9a5f2c8064f5883

http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;hb=HEAD is even better, since that's HEAD.

To answer your question, AFAIK there's no consideration made for the number of tuples, and I rather doubt that would help much. If you want to go mucking around with this I think it'd be more useful to look at using a different hash algorithm for types that could be significant in size (namely varchar, text and bytea). There are algorithms newer than Jenkins that look to be significantly faster on larger volumes of data.

You might find https://github.com/decibel/hash_testing useful; it's way to test hash performance of several hashes. https://github.com/decibel/postgres/tree/xxhash is where I pulled xxhash into Postgres to try using it for hashing shared buffer identifiers. It didn't help there, but it might be faster than Jenkins for things like hashing text fields.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux