Re: Standard uuid vs. custom data type uuid_v1

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Sat, 27 Jul 2019 15:47:36 +0200

On Thu, Jul 25, 2019 at 11:26:23AM +0200, Ancoron Luciferis wrote:
Hi,

I have finally found some time to implement a custom data type optimized
for version 1 UUID's (timestamp, clock sequence, node):
https://github.com/ancoron/pg-uuid-v1

Some tests (using a few millions of rows) have shown the following
results (when used as a primary key):

COPY ... FROM: ~7.8x faster (from file - SSD)
COPY ... TO  : ~1.5x faster (no where clause, sequential output)

The best thing is that for INSERT's there is a very high chance of
hitting the B-Tree "fastpath" because of the timestamp being the most
significant part of the data type, which tends to be increasing.

This also results in much lower "bloat", where the standard "uuid" type
easily goes beyond 30%, the "uuid_v1" should be between 10 and 20%.

Additionally, it also reveals the massive performance degrade I saw in
my tests for standard UUID's:

Initial 200 million rows: ~ 80k rows / second
Additional 17 million rows: ~26k rows / second

...and the new data type:
Initial 200 million rows: ~ 623k rows / second
Additional 17 million rows: ~618k rows / second

Presumably, the new data type is sorted in a way that eliminates/reduces
random I/O against the index. But maybe that's not the case - hard to
say, because the linked results don't say how the data files were
generated ...

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services