Search Postgresql Archives

Re: Hash Function: MD5 or other?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 13 Jun 2005, at 23:49, Peter Fein wrote:


Hi-

I wanted to use a partially unique index (dependent on a flag) on a TEXT
column, but the index row size was too big for btrees.  See the thread
"index row size 2728 exceeds btree maximum, 2713" from the beginning of this month for someone with a similar problem. In it, someone suggests
indexing on a hash of the text.  I'm fine with this, as the texts in
question are similar enough to each other to make collisions unlikely
and a collision won't really cause any serious problems.

My question is: is the builtin MD5 appropriate for this use or should I be using a function from pl/something? Figures on collision rates would
be nice as well - the typical chunk of text is probably 1k-8k.

Thanks!


As others have said MD5 isn't the fastest one out there. However no cryptographically secure hashes are really that fast. However you can probably get away with using a CRC hash which is long enough to reduce your chances of collision a lot. However, PostgreSQL doesn't have a built in CRC function, which is a bit of a pain unless your prepared to implement one, or use pl/* to do it, which sounds like overkill. I suggest you run some benchmarks on MD5 and see if it's fast enough to meet your current (and perhaps future) needs.

You could of course, just use a hash index on your text field! I think that would probably cope with larger data sets OK. It has the advantage of handling collisions for you as well :) However it means you have to transfer large amounts of data around, so if network speed ever becomes a limitation, MD5 hashing (or enabling compression on your PgSQL connection) may help.


--
Peter Fein pfein@xxxxxxxxx 773-575-0694

Basically, if you're not a utopianist, you're a schmuck. -J. Feldman

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org






---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux