On Tue, 22 Dec 2015 11:07:30 -0600 Brendan McCollam <bmccollam@xxxxxxxxxxxx> wrote: > (previously posted to the pgsql-performance list) > > Hello, > > We're in the process of designing the database for a new service, and > some of our tables are going to be using UUID primary key columns. > > We're trying to decide between: > > * UUIDv1 (timestamp/MAC uuid) and > > * UUIDv4 (random uuid) > > And the separate but related choice between: > > * Generating the UUIDs client-side with the Python uuid library > (https://docs.python.org/2/library/uuid.html) or > > * Letting PostgreSQL handle uuid creation with the uuid-ossp extension > (http://www.postgresql.org/docs/9.4/static/uuid-ossp.html) > > In terms of insert and indexing/retrieval performance, is there one > clearly superior approach? If not, could somebody speak to the > performance tradeoffs of different approaches? > > There seem to be sources online (e.g. > https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/ > http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/) > that claim that UUIDv4 (random) will lead to damaging keyspace > fragmentation and using UUIDv1 will avoid this. There's no substance to these claims. Chasing the links around we finally find this article: http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/ which makes the reasonable argument that random primary keys can cause performance robbing fragmentation on clustered indexes. But Postgres doesn't _have_ clustered indexes, so that article doesn't apply at all. The other authors appear to have missed this important point. One could make the argument that the index itself becomming fragmented could cause some performance degredation, but I've yet to see any convincing evidence that index fragmentation produces any measurable performance issues (my own experiments have been inconclusive). Looking at it another way, a quick experiment shows that PG can fit about 180 UUID primary keys per database page, which means a million row table will use about 5600 pages to the tune of about 46m. On modern hardware, that index is likely to be wholly in memory all the time. If your performance requirements are really so dire, then you should probably consider ditching UUIDs as keys. Taking the same million row table I postulated in the previous paragraph, but using ints insted of UUIDs for the primary key, the primary key index would be about 3200 pages (~26m) ... or almost 1/2 the size -- making it more likely to all be in memory at any point in time. I seriously doubt that trying to make your UUIDs generate in a predictable fashon will produce any measurable improvement, and I see no evidence in the articles you cited that claims otherwise have any real basis or were made by anyone knowledgeable enough to know. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general