On 29/01/2010 4:20 PM, Joe Kramer wrote:
Hello,
I need to generate unique id which is not guessable unlike
serial(integer) type. I need an id in format like md5 hash of random
number.
On top of that I need this id to be unique across multiple tables.
Anyone had to solve this problem before? Can you post any recipes or
best practices please?
My questions:
1. Avoiding collisions.
If I make an UNIQUE constraint and do generation of id triggered on
INSERT. What if collision happens? DO I nee d to check if unique hash
already exists and if not- regenerate.
This looks too primitive. Is there a readily available function or
methodology to do that?
2. Generating global unique id across multiple tables.
How to do that? My only idea is to have separate table to keep all
hashes and compare for collision against that table.
Is there a better way? Maybe by creating some special serial type that
is not integer but varchar?
3. what function to use to generate 64-bit random hash without much
overhead to CPU?
When I ran into something somewhat akin to this I asked the list about a
non-repeating pseudo-random mapping function. Daniel Verite (on this
list) enlightened me about Feistel networks/cyphers, and even posted a
PL/PgSQL implementation!
It's documented here:
http://wiki.postgresql.org/wiki/Pseudo_encrypt
and has been extremely handy. It should fit you needs - just define a
sequence that you pull new input values from, and use that same sequence
across all tables that need unique values.
It's not trivial to assert, across multiple tables, that a value is
unique. You could do it with a trigger that checks each table that uses
such values (slow-ish but effective) or maintains a side table of values
in use. In either case it'd have to be added to every table that used
the pseudo_encrypt values.
By the way, if you intend to expose these to users you might also want
some kind of data entry error checking so that a typo can't accidentally
transform id `n' to id `m' with transposition of a single digit or the
like. The luhn algorithm provides a good way to do that with a simple
check digit - it's not cryptographically strong in that if you know the
values are checked with the luhn algorithm it's trivial to re-generate
the check digit, but it helps a log against casual scanning of the
number space and against accidental user data entry error. Check out:
http://wiki.postgresql.org/wiki/Luhn_algorithm
--
Craig Ringer
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general