On 13 Jun 2005, at 23:49, Peter Fein wrote:
Hi-
I wanted to use a partially unique index (dependent on a flag) on a
TEXT
column, but the index row size was too big for btrees. See the thread
"index row size 2728 exceeds btree maximum, 2713" from the
beginning of
this month for someone with a similar problem. In it, someone
suggests
indexing on a hash of the text. I'm fine with this, as the texts in
question are similar enough to each other to make collisions unlikely
and a collision won't really cause any serious problems.
My question is: is the builtin MD5 appropriate for this use or
should I
be using a function from pl/something? Figures on collision rates
would
be nice as well - the typical chunk of text is probably 1k-8k.
Thanks!
As others have said MD5 isn't the fastest one out there. However no
cryptographically secure hashes are really that fast. However you
can probably get away with using a CRC hash which is long enough to
reduce your chances of collision a lot. However, PostgreSQL doesn't
have a built in CRC function, which is a bit of a pain unless your
prepared to implement one, or use pl/* to do it, which sounds like
overkill. I suggest you run some benchmarks on MD5 and see if it's
fast enough to meet your current (and perhaps future) needs.
You could of course, just use a hash index on your text field! I
think that would probably cope with larger data sets OK. It has the
advantage of handling collisions for you as well :) However it means
you have to transfer large amounts of data around, so if network
speed ever becomes a limitation, MD5 hashing (or enabling compression
on your PgSQL connection) may help.
--
Peter Fein pfein@xxxxxxxxx
773-575-0694
Basically, if you're not a utopianist, you're a schmuck. -J. Feldman
---------------------------(end of
broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)