UUID version 6 proposal, initial feedback

Ben Ramsey <ben@xxxxxxxxxxxxx> · Mon, 24 Feb 2020 22:09:15 -0600

On 2/4/20 11:36 UTC, Rob Wilton (rwilton) wrote:
> What you describe does sound to me like it could be a new form of
> UUID (if limited to a 128 bit format), and it could potentially also
> be useful.  E.g. a 128 bit UUID that has good database locality
> properties and minimizes the leakage of private information sounds
> useful if it can be reasonably specified and implemented.
> 
> I also note that RFC 4122 is 15 years old, and as Martin previously
> indicated there are security and privacy considerations that have
> evolved over time, hence updating RFC 4122 to make readers aware of
> those considerations also seems like it could potentially be useful.
> 
> Writing this up as a draft sounds like a good next step to see if
> there is enough wider interest.

FYI, Brad has submitted his first draft for review. You can see it here:
https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/

I've been following this for a while, and as the author of a popular
userland UUID library for PHP <https://github.com/ramsey/uuid>, I'd like
to throw my support behind this proposal and describe a few of the pain
points that have led application developers down the path of modifying
the existing UUID structure to better suite their needs.

As a standard, the UUID format is ubiquitous and portable. Despite some
of its shortcomings, and the desire (as some have raised on this list)
to create a new standard other than UUID, it's a desirable format, for
many reasons.

There is one primary shortcoming that results in a frequent need to
modify the format, and this is the shortcoming that Brad's version 6
UUID attempts to overcome. When developers begin storing UUIDs in
relational databases, they inevitably arrive at one or all of these
articles (which I'm surprised haven't yet been mentioned in this thread):

* http://www.informit.com/articles/printerfriendly/25862
* https://blog.codinghorror.com/primary-keys-ids-versus-guids/
* https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

As a result, in my PHP library, I have implemented alternate _codecs_ to
encode/decode UUIDs in more optimal ways for database fields, especially
for use as primary keys. Two of these codecs are:

* Timestamp-first COMB
* Ordered Time UUID

The timestamp-first COMB is a version 4 UUID combined with a Unix
timestamp as the first 48 bits, resulting in a monotonically-increasing
UUID. For all intents and purposes, the resulting value always looks
like a version 4 UUID (the version and variant bits remain in the same
places as defined by RFC 4122).

The ordered time UUID is similar but retains the semantics of the
version 1 UUID. That is, the UUID can be deconstructed to produce a node
value, clock sequence value, and timestamp with nanosecond fidelity. The
difference is that the timestamp is rearranged so that the UUID is
monotonically increasing.

The problem with this approach, though, is that the first 2 bytes are
the same as the time_hi_and_version field, which means the UUID version
now occupies the first 4 bits of the UUID. Unless you know how the bits
of this UUID were rearranged, there's no way to reliably tell that it
was originally a version 1 UUID.

Therein lies the problem. The use-case is for a version 1 UUID, from
which an application can retrieve nanosecond timestamp and node values,
while being monotonically increasing so that it does not scatter the
records in my database engine. But, by rearranging the bits to achieve
this, I'm placing a dependency on my application to know how to
deconstruct the bits when retrieving from the database. It's not very
portable, error-prone, and can lead to developer confusion.

Brad's version 6 UUID solves this problem.

There are two primary issues I have with the current draft (I have many
other comments, but I want to start with these two, and I'm also unsure
how IETF discussion on drafts proceeds, so I'm eager to learn from others):

1. The draft doesn't appear to go into detail about the arrangement of
the bits and how the timestamp should be split to accommodate the
version field, while the earlier version (posted here:
<http://gh.peabody.io/uuidv6/>) does go into this detail.

2. IMO, I think the alternate text formats do not belong in this
document. I think this document should focus on the version 6 UUID, and
the alternate text formats can be defined in a separate document. The
ULID spec seems like a good specification to draw inspiration from,
since it's compatible with any 128-bit number and already has a number
of implementations. <https://github.com/ulid/spec>

Cheers,
Ben

P.S. Yes, I am aware of privacy concerns with the use of the node field
in version 1 UUIDs. I'm happy to discuss potential use-cases of the node
field that can be used to track where a UUID was minted without
revealing potentially private information, but I don't think the
mechanism for creating the node field should be part of this draft.

Attachment:
signature.asc

Description: OpenPGP digital signature