Re: UUID version 6 proposal, initial feedback

George Michaelson <ggm@xxxxxxxxxxxx> · Thu, 27 Feb 2020 08:38:15 +1000

For every example (in some things) there is a counter example. As a
thought experiment I have tried to think what is the counter-example
for a DB aligned UUID. The only one I have in hypothesis is something
around the information leakage that two UUID reside in the same
logical data frame.

So I think there is a likelihood more real/tangible reasons NOT to do
this exist, alongside the proven ones to deliberately do this. (e.g.
sharding, Bloom filters over discrete sets, parallelism...)

Therefore, I hope any work in UUID is additive, not subtractive or
solely re-definitional to *one* form, because well.. for every UUID
structural example, there probably is a counter-example.

-G

On Thu, Feb 27, 2020 at 5:09 AM Laurence Lundblade
<lgl@xxxxxxxxxxxxxxxxx> wrote:
>
> The UUID format seems somewhat anachronistic going back to a time when good HW RNGs were uncommon. That’s not true today. HW was introduced between 2010 and 2015 for good RNGs, particularly this  The better choice today seems just a sequence of cryptographic quality random bytes.  This is already being done for a nonce field in some protocols. They are not UUID format.
>
> Except, as discussed, true RNG IDs don’t work well with databases..
>
> One option is to say the databases should be fixed to work with true RNG IDs. In some case this is what will have to happen.
>
> Another is to design an ID that is database friendly, which is what this draft is about. Protocols can use that if they like and they can meet the generation requirements. Generation requirements seem to require a clock or stored coordinate state.
>
> For a new ID, it might be worth breaking free from the UUID format to design a better database-friendly ID. The UUID format doesn’t seem particularly necessary nowadays. It might good to allow for more bits too (UUIDs are fixed at slightly less than 128).
>
> LL
>
>
>
> On Feb 24, 2020, at 8:09 PM, Ben Ramsey <ben@xxxxxxxxxxxxx> wrote:
>
> On 2/4/20 11:36 UTC, Rob Wilton (rwilton) wrote:
>
> What you describe does sound to me like it could be a new form of
> UUID (if limited to a 128 bit format), and it could potentially also
> be useful.  E.g. a 128 bit UUID that has good database locality
> properties and minimizes the leakage of private information sounds
> useful if it can be reasonably specified and implemented.
>
> I also note that RFC 4122 is 15 years old, and as Martin previously
> indicated there are security and privacy considerations that have
> evolved over time, hence updating RFC 4122 to make readers aware of
> those considerations also seems like it could potentially be useful.
>
> Writing this up as a draft sounds like a good next step to see if
> there is enough wider interest.
>
>
>
> FYI, Brad has submitted his first draft for review. You can see it here:
> https://datatracker.ietf.org/doc/draft-peabody-dispatch-new-uuid-format/
>
> I've been following this for a while, and as the author of a popular
> userland UUID library for PHP <https://github.com/ramsey/uuid>, I'd like
> to throw my support behind this proposal and describe a few of the pain
> points that have led application developers down the path of modifying
> the existing UUID structure to better suite their needs.
>
> As a standard, the UUID format is ubiquitous and portable. Despite some
> of its shortcomings, and the desire (as some have raised on this list)
> to create a new standard other than UUID, it's a desirable format, for
> many reasons.
>
> There is one primary shortcoming that results in a frequent need to
> modify the format, and this is the shortcoming that Brad's version 6
> UUID attempts to overcome. When developers begin storing UUIDs in
> relational databases, they inevitably arrive at one or all of these
> articles (which I'm surprised haven't yet been mentioned in this thread):
>
> * http://www.informit.com/articles/printerfriendly/25862
> * https://blog.codinghorror.com/primary-keys-ids-versus-guids/
> * https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/
>
> As a result, in my PHP library, I have implemented alternate _codecs_ to
> encode/decode UUIDs in more optimal ways for database fields, especially
> for use as primary keys. Two of these codecs are:
>
> * Timestamp-first COMB
> * Ordered Time UUID
>
> The timestamp-first COMB is a version 4 UUID combined with a Unix
> timestamp as the first 48 bits, resulting in a monotonically-increasing
> UUID. For all intents and purposes, the resulting value always looks
> like a version 4 UUID (the version and variant bits remain in the same
> places as defined by RFC 4122).
>
> The ordered time UUID is similar but retains the semantics of the
> version 1 UUID. That is, the UUID can be deconstructed to produce a node
> value, clock sequence value, and timestamp with nanosecond fidelity. The
> difference is that the timestamp is rearranged so that the UUID is
> monotonically increasing.
>
> The problem with this approach, though, is that the first 2 bytes are
> the same as the time_hi_and_version field, which means the UUID version
> now occupies the first 4 bits of the UUID. Unless you know how the bits
> of this UUID were rearranged, there's no way to reliably tell that it
> was originally a version 1 UUID.
>
> Therein lies the problem. The use-case is for a version 1 UUID, from
> which an application can retrieve nanosecond timestamp and node values,
> while being monotonically increasing so that it does not scatter the
> records in my database engine. But, by rearranging the bits to achieve
> this, I'm placing a dependency on my application to know how to
> deconstruct the bits when retrieving from the database. It's not very
> portable, error-prone, and can lead to developer confusion.
>
> Brad's version 6 UUID solves this problem.
>
> There are two primary issues I have with the current draft (I have many
> other comments, but I want to start with these two, and I'm also unsure
> how IETF discussion on drafts proceeds, so I'm eager to learn from others):
>
> 1. The draft doesn't appear to go into detail about the arrangement of
> the bits and how the timestamp should be split to accommodate the
> version field, while the earlier version (posted here:
> <http://gh.peabody.io/uuidv6/>) does go into this detail.
>
> 2. IMO, I think the alternate text formats do not belong in this
> document. I think this document should focus on the version 6 UUID, and
> the alternate text formats can be defined in a separate document. The
> ULID spec seems like a good specification to draw inspiration from,
> since it's compatible with any 128-bit number and already has a number
> of implementations. <https://github.com/ulid/spec>
>
> Cheers,
> Ben
>
> P.S. Yes, I am aware of privacy concerns with the use of the node field
> in version 1 UUIDs. I'm happy to discuss potential use-cases of the node
> field that can be used to track where a UUID was minted without
> revealing potentially private information, but I don't think the
> mechanism for creating the node field should be part of this draft.
>
>
>
>