On 12/10/19 3:11 PM, Erwin Brandstetter wrote:
I am looking for stable hash functions producing 8-byte or 4-byte hashes
from long text values in Postgres 10 or later.
There is md5(), the result of which can be cast to uuid. This reliably
produces practically unique, stable 16-byte values. I have usecases where
an 8-byte or even 4-byte hash would be good enough to make collisions
reasonably unlikely. (I can recheck on the full string) - and expression
indexes substantially smaller. I could truncate md5 and cast back and
forth, but that seems like a lot of wasted computation. Are there
suggestions for text hash functions that are
- fast
- keep collisions to a minimum
- stable across major Postgres versions (so expression indexes don't break)
- croptographic aspect is not needed (acceptable, but no benefit)
What about a CRC32 function? It's fast, and an SSE4 instruction has been in
Intel CPUs for about 10 years.
There is an old post from 2012 by Tom Lane suggesting that hashtext() and
friends are not for users:
https://www.postgresql.org/message-id/24463.1329854466%40sss.pgh.pa.us
Postgres 11 added hashtextextended() and friends to generate bigint
hashes. In a more recent post from 3 months ago, Tom suggests to use it in
user-land - if portability is not needed:
https://www.postgresql.org/message-id/9434.1568839177%40sss.pgh.pa.us
Is pghashlib by Marko Kreen my best option?
https://github.com/markokr/pghashlib
Or the "version-independent hash functions for PostgreSQL" from Peter
Eisentraut:
https://github.com/petere/pgvihash
Neither received updates for a couple of years. Unmaintained? Or obsolete?
And neither is available on most hosted services like RDS or Heroku (which
would be required in come cases).
So what are my best options?
Regards
Erwin
--
Angular momentum makes the world go 'round.