Re: UUID column as pimrary key?

Alban Hertroys <dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 6 Jan 2011 09:02:43 +0100

On 6 Jan 2011, at 24:27, Chris Browne wrote:

>> Next to that, UUID's are generated by computers. I have no doubts that
>> the numeric space that makes up a UUID allows for collision chances as
>> low as described, but are computers capable of generating those
>> numbers sufficiently random that they actually achieve that low a
>> chance? I think that's pushing it.
> 
> RFC 4122 does NOT point to randomness as the only criterion to
> discourage collisions, and treating UUIDs as if they were merely about
> being "sufficiently random to achieve low chance of collision" is
> insulting to the drafters of the standard, because they were certainly
> NOT so naive as to think that was sufficient.

I'm sure the designers knew what they were getting into. This comment was aimed at people claiming things like "with this and that huge number of events a collision won't occur in 100 billion years", which - to me at least - looks like they're only looking at the big number of bits involved without understanding statistical analysis.
Let's just say, if the developers of "Microsoft Visual Nuclear Power Plant Designer Professional" were claiming things like that, would you trust their product?

The main point with the randomness of UUID's remains that you _can_ have a collision at any given moment. It's unlikely to ever happen, but you can't predict when it will happen if it does. The possible consequences of a collision matter a lot in designing whether and how to handle these collisions. Maybe it doesn't matter at all, maybe you should get a really hefty insurance, or maybe you need to evacuate the country.

Opposed to that, a sequence isn't random and therefore you can predict when you will run into collisions - namely once the sequence wraps. Considering that even a 32-bit sequence allows for several billions of rows before collisions _can_ occur, you can be certain that your problem is pretty far into the future.
It _will_ be a big problem without an obvious solution if it occurs though, as from that point on you will run into a lot of collisions and the resolution to the problem is rather dependent on what you're working on.

Now that is not an argument against protecting your application against collisions, if there is a chance that you will run into collisions (you won't in a 10-record lookup table, for example) then you need to take that into consideration in your designs, but there are many (usually obvious) cases in which it's safe to omit it. With UUID's that's a little more complicated.

I don't think anyone in this discussion is saying "Don't use UUID's!". Just be aware of their limitations and the problem domains where they are sensible to use. The same goes for sequences.
It would, for example, be (obviously) pretty insane to use UUID's for a 10-record lookup table. There's plenty of examples in this thread where they shine, I don't need to repeat that.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d25777f11541886517442!

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general