On 27/07/2019 15:47, Tomas Vondra wrote: > On Thu, Jul 25, 2019 at 11:26:23AM +0200, Ancoron Luciferis wrote: >> Hi, >> >> I have finally found some time to implement a custom data type optimized >> for version 1 UUID's (timestamp, clock sequence, node): >> https://github.com/ancoron/pg-uuid-v1 >> >> Some tests (using a few millions of rows) have shown the following >> results (when used as a primary key): >> >> COPY ... FROM: ~7.8x faster (from file - SSD) >> COPY ... TO : ~1.5x faster (no where clause, sequential output) >> >> The best thing is that for INSERT's there is a very high chance of >> hitting the B-Tree "fastpath" because of the timestamp being the most >> significant part of the data type, which tends to be increasing. >> >> This also results in much lower "bloat", where the standard "uuid" type >> easily goes beyond 30%, the "uuid_v1" should be between 10 and 20%. >> >> Additionally, it also reveals the massive performance degrade I saw in >> my tests for standard UUID's: >> >> Initial 200 million rows: ~ 80k rows / second >> Additional 17 million rows: ~26k rows / second >> >> ...and the new data type: >> Initial 200 million rows: ~ 623k rows / second >> Additional 17 million rows: ~618k rows / second >> > > Presumably, the new data type is sorted in a way that eliminates/reduces > random I/O against the index. But maybe that's not the case - hard to > say, because the linked results don't say how the data files were > generated ... > Yes, by definition, version 1 UUID's contain the timestamp and when used in applications, that is usually the "current" timestamp when an entry gets prepared for storage. As a result, for testing I've simulated 9 nodes with increasing timestamps, which then results in very similar behavior to sequentially increasing values. The calculation for the above numbers has been based on "\timing" in logged psql test sessions (so the little client overhead is included in calculation). Internal behavior has also been tracked with Linux "perf record" and flamegraph'd. Test data has been generated by the following Java (Maven) project: https://github.com/ancoron/java-uuid-serial ...with arguments: mvn clean test \ -Duuids=217000000 \ # number of UUID's -Duuid.skip.v1=false \ # generate V1 UUID's and serial -Dnodes=9 \ # how many nodes to simulate -Duuid.historic=true \ # enable historic mode (not "now") -Dinterval_days=365 # range of days to generate ...which will generate V1 UUID's into a file "target/uuids.v1.historic.txt" as follows: fb2893ae-9265-11e8-90d8-e03f496e733b fb2c3d2e-9265-11e8-a131-e03f49777cbb fda90b33-9265-11e8-af1b-e03f4957fa73 fdaba343-9265-11e8-b648-e03f49e7fd77 fdad50f3-9265-11e8-b9be-e03f49de7ab7 fdaf73d3-9265-11e8-bdce-e03f49dff937 fdb25a03-9265-11e8-a0d8-e03f49c67ff3 fdb28113-9265-11e8-9c15-e03f4976fd73 fdb2a823-9265-11e8-8273-e03f49d6f3f7 fdb8c2a3-9265-11e8-90d8-e03f496e733b fdbc6c23-9265-11e8-a131-e03f49777cbb 00393a28-9266-11e8-af1b-e03f4957fa73 003bd238-9266-11e8-b648-e03f49e7fd77 003d7fe8-9266-11e8-b9be-e03f49de7ab7 003fa2c8-9266-11e8-bdce-e03f49dff937 004288f8-9266-11e8-a0d8-e03f49c67ff3 0042b008-9266-11e8-9c15-e03f4976fd73 0042d718-9266-11e8-8273-e03f49d6f3f7 0048f198-9266-11e8-90d8-e03f496e733b ...so yes, they are ever-increasing from a timestamp-perspective, which is what I was intending to optimize for in the first iteration. I'll do some more testing with slight time offsets between the nodes to simulate behavior when some nodes generate time a few seconds behind others. I might re-implement the UUID test data generator in Python as that would be far more usable for others. Cheers, Ancoron