Re: How to generate unique hash-type id?

Joe Kramer <cckramer@xxxxxxxxx> · Fri, 29 Jan 2010 22:51:20 +1100

Thanks for the answer,

I am unable to use ossp_uuid due to package install and/or server
rebuild requirement.

So I am trying to roll my own, and
digest(quote_literal(random()+random()), 'sha256'), 'hex') doesn't
work:

I have created this table and inserted 200000 rows (two million).
This is more or less now my application looks now. It uses bigserial.
And I need to add some unique hash:
CREATE TABLE item
(
  item_id bigserial NOT NULL,
  title character varying,
  CONSTRAINT pk PRIMARY KEY (item_id)
)
WITH (
  OIDS=FALSE
);

Now I add the hash column:
ALTER TABLE item ADD COLUMN hash1 character varying NOT NULL DEFAULT
encode(digest(quote_literal(random()+random()), 'sha256'), 'hex');
ALTER TABLE item ADD UNIQUE (hash1);

When I executed this two statements, ALTER TABLE ADD COUMN, ADD
UNIQUE, after  20 seconds I got this message:
NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index
"item_hash1_key" for table "item"
ERROR:  could not create unique index "item_hash1_key"
DETAIL:  Table contains duplicated values.
********* Error **********
ERROR: could not create unique index "item_hash1_key"
SQL state: 23505
Detail: Table contains duplicated values.

So this means random()+random() is not random even within 2,000,000 iterations!

If you suggest accessing /dev/urandom directly- I cannot do that
because my application runs on mac,windows and linux. It would be
maintenance nightmare.

Any suggestions?

Thanks.

On Fri, Jan 29, 2010 at 10:20 PM, Adrian von Bidder
<avbidder@xxxxxxxxxxx> wrote:
> On Friday 29 January 2010 11.21:00 Joe Kramer wrote:
>> We have bunch of servers running the app and rebuilding postgres with
>> support for ossp_uuid on all servers is time consuming.
>> Is there a way of doing it without third party dependency like
>> ossp_uuid? Should I just run md5(random number), will itbe the same ?
>
> If you're building your own: at least use sha1 instead of md5.
>
> (Even md5 *should* be safe in the absence of malicious attacks, but md5 is
> generally  not recommended anymore.)
>
> Everything depends on the quality of your random numbers.  I don't know how
> much randomness pg's random() delivers, and as I've said I haven't looked
> what the uuid module does.
>
> (To give you an example: if random() only delivers a random 32 bit float
> value, the 160 bits of SHA-1 will not be used.  You'll only use 4 billion
> different values and you *will* soon get collisions.)
>
> If I were to roll my own, I'd just use 256 bit of /dev/random (or, depending
> on the application, possibly /dev/urandom and take the risk that my values
> aren't that random.)  Since it's random anyway, there's no need to use a
> hash.  (Not sure: can a SQL function read arbitrary binary files or will a C
> module be necessary?)
>
> Speed: just did a quick test on one machine.  reading 1kB from /dev/random
> takes about 1s.  (constant 5MB/s disk activity with lots of seeking going
> on, no hw random device.)  So you'd get ca. 32 id values per second.  Don't
> know if that's a lot or not for your application.
>
> Magnus: can you elaborate on uuid not being secure?  AFAICT v4 uuid are
> supposed to be essentially a random number formatted in a certain way.
>
> cheers
> -- vbi
>
>
> --
> featured product: GNU Privacy Guard - http://gnupg.org
>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general