Re: Standard uuid vs. custom data type uuid_v1

Ancoron Luciferis <ancoron.luciferis@xxxxxxxxxxxxxx> · Mon, 29 Jul 2019 10:13:23 +0200

On 27/07/2019 15:47, Tomas Vondra wrote:
> On Thu, Jul 25, 2019 at 11:26:23AM +0200, Ancoron Luciferis wrote:
>> Hi,
>>
>> I have finally found some time to implement a custom data type optimized
>> for version 1 UUID's (timestamp, clock sequence, node):
>> https://github.com/ancoron/pg-uuid-v1
>>
>> Some tests (using a few millions of rows) have shown the following
>> results (when used as a primary key):
>>
>> COPY ... FROM: ~7.8x faster (from file - SSD)
>> COPY ... TO  : ~1.5x faster (no where clause, sequential output)
>>
>> The best thing is that for INSERT's there is a very high chance of
>> hitting the B-Tree "fastpath" because of the timestamp being the most
>> significant part of the data type, which tends to be increasing.
>>
>> This also results in much lower "bloat", where the standard "uuid" type
>> easily goes beyond 30%, the "uuid_v1" should be between 10 and 20%.
>>
>> Additionally, it also reveals the massive performance degrade I saw in
>> my tests for standard UUID's:
>>
>> Initial 200 million rows: ~ 80k rows / second
>> Additional 17 million rows: ~26k rows / second
>>
>> ...and the new data type:
>> Initial 200 million rows: ~ 623k rows / second
>> Additional 17 million rows: ~618k rows / second
>>
> 
> Presumably, the new data type is sorted in a way that eliminates/reduces
> random I/O against the index. But maybe that's not the case - hard to
> say, because the linked results don't say how the data files were
> generated ...
> 

Yes, by definition, version 1 UUID's contain the timestamp and when used
in applications, that is usually the "current" timestamp when an entry
gets prepared for storage. As a result, for testing I've simulated 9
nodes with increasing timestamps, which then results in very similar
behavior to sequentially increasing values.

The calculation for the above numbers has been based on "\timing" in
logged psql test sessions (so the little client overhead is included in
calculation). Internal behavior has also been tracked with Linux "perf
record" and flamegraph'd.

Test data has been generated by the following Java (Maven) project:
https://github.com/ancoron/java-uuid-serial

...with arguments:

mvn clean test \
	-Duuids=217000000 \ # number of UUID's
	-Duuid.skip.v1=false \ # generate V1 UUID's and serial
	-Dnodes=9 \ # how many nodes to simulate
	-Duuid.historic=true \ # enable historic mode (not "now")
	-Dinterval_days=365 # range of days to generate

...which will generate V1 UUID's into a file
"target/uuids.v1.historic.txt" as follows:

fb2893ae-9265-11e8-90d8-e03f496e733b
fb2c3d2e-9265-11e8-a131-e03f49777cbb
fda90b33-9265-11e8-af1b-e03f4957fa73
fdaba343-9265-11e8-b648-e03f49e7fd77
fdad50f3-9265-11e8-b9be-e03f49de7ab7
fdaf73d3-9265-11e8-bdce-e03f49dff937
fdb25a03-9265-11e8-a0d8-e03f49c67ff3
fdb28113-9265-11e8-9c15-e03f4976fd73
fdb2a823-9265-11e8-8273-e03f49d6f3f7
fdb8c2a3-9265-11e8-90d8-e03f496e733b
fdbc6c23-9265-11e8-a131-e03f49777cbb
00393a28-9266-11e8-af1b-e03f4957fa73
003bd238-9266-11e8-b648-e03f49e7fd77
003d7fe8-9266-11e8-b9be-e03f49de7ab7
003fa2c8-9266-11e8-bdce-e03f49dff937
004288f8-9266-11e8-a0d8-e03f49c67ff3
0042b008-9266-11e8-9c15-e03f4976fd73
0042d718-9266-11e8-8273-e03f49d6f3f7
0048f198-9266-11e8-90d8-e03f496e733b

...so yes, they are ever-increasing from a timestamp-perspective, which
is what I was intending to optimize for in the first iteration.

I'll do some more testing with slight time offsets between the nodes to
simulate behavior when some nodes generate time a few seconds behind others.

I might re-implement the UUID test data generator in Python as that
would be far more usable for others.

Cheers,

	Ancoron