Re: Sequence vs UUID

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 26 Jan 2023 16:42:57 -0600

On Thu, Jan 26, 2023 at 3:50 PM Rob Sargent <robjsargent@xxxxxxxxx> wrote:

    On 1/26/23 14:36, Merlin Moncure wrote:

        On Thu, Jan 26, 2023 at 1:18 PM veem v <veema0000@xxxxxxxxx>
          wrote:

              Hello, We were trying to
                  understand whether we should use UUID or Sequence in
                  general for primary keys. In many of the blogs(one is
                  below) across multiple databases, I saw over the
                  internet and all are mostly stating the sequence is
                  better as compared to UUID. But I think in the case of
                  concurrent data load scenarios UUID will spread the
                  contention point whereas sequence can be a single
                  point of contention. 
              So
                    we want to understand from experts here, if
                  there are any
                    clear rules available or if we have
                    any pros vs cons list available for each of those to
                    understand the exact scenario in which we should go
                    for one over other?
                  Basically I wanted
                    to see if we can perform some test on sample data to
                    see the percentage of overhead on read and write
                    performances of the query in presence of UUID VS
                    Sequence to draw some conclusion in general? And
                    also considering open source postgres as the base
                    for many databases like redshift etc, so the results
                    which apply to progress would apply to others as
                    well.

              https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/

          * UUIDs are big for identifer (16 bytes) sequence is 4 or
            8
          * sequences are NOT a contention point, database uses
            some tricks to work around that

            * UUIDS are basically random data causing page
            fragmentation.  this is particularly bad in auto clustering
            architectures like sql server 
          * Also, UUIDS can deliver very poor buffer hit ratios
            when sweeping ranges of records on large tables.   

    "Sweeping ranges of records" by ID suggests you have information in
    your id.  If you're  comfortable with sequence generation as a
    surrogate for time-point of entry, by all means have at it.

Yeah, sequences preserve time locality.   Clustering the table on the primary key would then optimize certain cases.   Clustering is useless on guids.  Sequences do better here and hybrid natural key models can do better still.  Buffer hit ratio and read amplification can be major issues if not understood and controlled for.

    For many of us, contemporaneously generated records have nothing to
    do with each other. (And they carry attributes which groups them.)

    UUIDs do a very good job of avoiding id-overlap across domains
    (compare to a sequence for each domain).

 While true, there are other solutions to the issue, one or more of,
* separating id generation from storage
* reserving id ranges
* creating domain id composite alongside generated id

Relying solely on guid to generate uniqueness seems like a broken model anyways in the general case.   Whatever thing is responsible for suppressing information conflicts ought to be emitting the id.

merlin