Re: UUID version 6 proposal, initial feedback

Laurence Lundblade <lgl@xxxxxxxxxxxxxxxxx> · Wed, 26 Feb 2020 11:09:25 -0800

The UUID format seems somewhat anachronistic going back to a time when good HW RNGs were uncommon. That’s not true today. HW was introduced between 2010 and 2015 for good RNGs, particularly this  The better choice today seems just a sequence of cryptographic quality random bytes.  This is already being done for a nonce field in some protocols. They are not UUID format.
Except, as discussed, true RNG IDs don’t work well with databases. 

One option is to say the databases should be fixed to work with true RNG IDs. In some case this is what will have to happen.

Another is to design an ID that is database friendly, which is what this draft is about. Protocols can use that if they like and they can meet the generation requirements. Generation requirements seem to require a clock or stored coordinate state. 

For a new ID, it might be worth breaking free from the UUID format to design a better database-friendly ID. The UUID format doesn’t seem particularly necessary nowadays. It might good to allow for more bits too (UUIDs are fixed at slightly less than 128).

LL

On Feb 24, 2020, at 8:09 PM, Ben Ramsey <ben@xxxxxxxxxxxxx> wrote:

On 2/4/20 11:36 UTC, Rob Wilton (rwilton) wrote:
What you describe does sound to me like it could be a new form of
UUID (if limited to a 128 bit format), and it could potentially also
be useful.  E.g. a 128 bit UUID that has good database locality
properties and minimizes the leakage of private information sounds
useful if it can be reasonably specified and implemented.

I also note that RFC 4122 is 15 years old, and as Martin previously
indicated there are security and privacy considerations that have
evolved over time, hence updating RFC 4122 to make readers aware of
those considerations also seems like it could potentially be useful.

Writing this up as a draft sounds like a good next step to see if
there is enough wider interest.

FYI, Brad has submitted his first draft for review. You can see it here:
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/

I've been following this for a while, and as the author of a popular
userland UUID library for PHP <https://github.com/ramsey/uuid>, I'd like
to throw my support behind this proposal and describe a few of the pain
points that have led application developers down the path of modifying
the existing UUID structure to better suite their needs.

As a standard, the UUID format is ubiquitous and portable. Despite some
of its shortcomings, and the desire (as some have raised on this list)
to create a new standard other than UUID, it's a desirable format, for
many reasons.

There is one primary shortcoming that results in a frequent need to
modify the format, and this is the shortcoming that Brad's version 6
UUID attempts to overcome. When developers begin storing UUIDs in
relational databases, they inevitably arrive at one or all of these
articles (which I'm surprised haven't yet been mentioned in this thread):

* http://www.informit.com/articles/printerfriendly/25862
* https://blog.codinghorror.com/primary-keys-ids-versus-guids/
* https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

As a result, in my PHP library, I have implemented alternate _codecs_ to
encode/decode UUIDs in more optimal ways for database fields, especially
for use as primary keys. Two of these codecs are:

* Timestamp-first COMB
* Ordered Time UUID

The timestamp-first COMB is a version 4 UUID combined with a Unix
timestamp as the first 48 bits, resulting in a monotonically-increasing
UUID. For all intents and purposes, the resulting value always looks
like a version 4 UUID (the version and variant bits remain in the same
places as defined by RFC 4122).

The ordered time UUID is similar but retains the semantics of the
version 1 UUID. That is, the UUID can be deconstructed to produce a node
value, clock sequence value, and timestamp with nanosecond fidelity. The
difference is that the timestamp is rearranged so that the UUID is
monotonically increasing.

The problem with this approach, though, is that the first 2 bytes are
the same as the time_hi_and_version field, which means the UUID version
now occupies the first 4 bits of the UUID. Unless you know how the bits
of this UUID were rearranged, there's no way to reliably tell that it
was originally a version 1 UUID.

Therein lies the problem. The use-case is for a version 1 UUID, from
which an application can retrieve nanosecond timestamp and node values,
while being monotonically increasing so that it does not scatter the
records in my database engine. But, by rearranging the bits to achieve
this, I'm placing a dependency on my application to know how to
deconstruct the bits when retrieving from the database. It's not very
portable, error-prone, and can lead to developer confusion.

Brad's version 6 UUID solves this problem.

There are two primary issues I have with the current draft (I have many
other comments, but I want to start with these two, and I'm also unsure
how IETF discussion on drafts proceeds, so I'm eager to learn from others):

1. The draft doesn't appear to go into detail about the arrangement of
the bits and how the timestamp should be split to accommodate the
version field, while the earlier version (posted here:
<http://gh.peabody.io/uuidv6/>) does go into this detail.

2. IMO, I think the alternate text formats do not belong in this
document. I think this document should focus on the version 6 UUID, and
the alternate text formats can be defined in a separate document. The
ULID spec seems like a good specification to draw inspiration from,
since it's compatible with any 128-bit number and already has a number
of implementations. <https://github.com/ulid/spec>

Cheers,
Ben

P.S. Yes, I am aware of privacy concerns with the use of the node field
in version 1 UUIDs. I'm happy to discuss potential use-cases of the node
field that can be used to track where a UUID was minted without
revealing potentially private information, but I don't think the
mechanism for creating the node field should be part of this draft.