Re: UUID v1 optimizations...

Ancoron Luciferis <ancoron.luciferis@xxxxxxxxxxxxxx> · Wed, 10 Jul 2019 00:08:34 +0200

On 08/07/2019 02:26, Peter Geoghegan wrote:
> Please don't top post -- trim the your response down so that only
> still-relevant text remains.
> 
> On Tue, Jun 11, 2019 at 1:27 PM Ancoron Luciferis
> <ancoron.luciferis@xxxxxxxxxxxxxx> wrote:
>> Primary key indexes after an ANALYZE:
>>     table_name     |     bloat      | index_mb | table_mb
>> -------------------+----------------+----------+----------
>>  uuid_v1           | 767 MiB (49 %) | 1571.039 | 1689.195
>>  uuid_v1_timestamp | 768 MiB (49 %) | 1571.039 | 1689.195
>>  uuid_seq          | 759 MiB (49 %) | 1562.766 | 1689.195
>>  uuid_serial       | 700 MiB (47 %) | 1504.047 | 1689.195
>>
>> OK, sadly no reclaim in any of them.
> 
> I don't know how you got these figures, but most likely they don't
> take into account the fact that the FSM for the index has free blocks
> available. You'll only notice that if you have additional page splits
> that can recycle that space. Or, you could use pg_freespacemap to get
> some idea.

Hm, I think I've already read quite a bit about the internals of the PG
b-tree index implementation but still cannot get to the answer how I
could influence that on my end as I want to stay compatible with the
standard UUID data storage but need time sorting support.

Anyway, I've made a bit of progress in testing and now have the full
tests executing unattended with the help of a script:
https://github.com/ancoron/pg-uuid-test

I've uploaded one of the test run results here:
https://gist.github.com/ancoron/d5114b0907e8974b6808077e02f8d109

After the first mass deletion, I can now see quite some savings for
both, serial and for my new time-sorted index:
 table_name  |      bloat      | index_mb | table_mb
-------------+-----------------+----------+----------
 uuid_v1     | 1500 MiB (48 %) | 3106.406 | 3378.383
 uuid_serial | 800 MiB (33 %)  | 2406.453 | 3378.383
 uuid_v1_ext | 800 MiB (33 %)  | 2406.453 | 3378.383

...but in a second case (DELETE old + INSERT new), the savings are gone
again in both cases:
 table_name  |      bloat      | index_mb | table_mb
-------------+-----------------+----------+----------
 uuid_v1     | 1547 MiB (49 %) | 3153.859 | 3378.383
 uuid_serial | 1402 MiB (47 %) | 3008.055 | 3378.383
 uuid_v1_ext | 1403 MiB (47 %) | 3008.055 | 3378.383

So, the question for me would be: Is there any kind of data that plays
optimal with space-savings in a rolling (e.g. last X rows) scenario?

> 
>> 5.) REINDEX
>> Table: uuid_v1              Time: 21549.860 ms (00:21.550)
>> Table: uuid_v1_timestamp    Time: 27367.817 ms (00:27.368)
>> Table: uuid_seq             Time: 19142.711 ms (00:19.143)
>> Table: uuid_serial          Time: 16889.807 ms (00:16.890)
>>
>> Even in this case it looks as if my implementation is faster than
>> anything else - which I really don't get.
> 
> Sorting already-sorted data is faster. CREATE INDEX is mostly a big
> sort operation in the case of B-Tree indexes.

Understood, this seems to be confirmed by my time-sorted index in the
new tests:
uuid_v1:     27632.660 ms (00:27.633)
uuid_serial: 20519.363 ms (00:20.519) x1.35
uuid_v1_ext: 23846.474 ms (00:23.846) x1.16

> 
>> I might implement a different opclass for the standard UUID to enable
>> time-wise index sort order. This will naturally be very close to
>> physical order but I doubt that this is something I can tell PostgreSQL, or?
> 
> PostgreSQL only knows whether or not your page splits occur in the
> rightmost page in the index -- it fills the page differently according
> to whether or not that is the case.
> 

As I've implemented the new opclass and the new tests showing the
results now, I think I can say that the time-sorting behavior as opposed
to rather random really benefits the overall performance, which is what
I actually care about most.

Cheers,

	Ancoron