On 08/07/2019 02:26, Peter Geoghegan wrote: > Please don't top post -- trim the your response down so that only > still-relevant text remains. > > On Tue, Jun 11, 2019 at 1:27 PM Ancoron Luciferis > <ancoron.luciferis@xxxxxxxxxxxxxx> wrote: >> Primary key indexes after an ANALYZE: >> table_name | bloat | index_mb | table_mb >> -------------------+----------------+----------+---------- >> uuid_v1 | 767 MiB (49 %) | 1571.039 | 1689.195 >> uuid_v1_timestamp | 768 MiB (49 %) | 1571.039 | 1689.195 >> uuid_seq | 759 MiB (49 %) | 1562.766 | 1689.195 >> uuid_serial | 700 MiB (47 %) | 1504.047 | 1689.195 >> >> OK, sadly no reclaim in any of them. > > I don't know how you got these figures, but most likely they don't > take into account the fact that the FSM for the index has free blocks > available. You'll only notice that if you have additional page splits > that can recycle that space. Or, you could use pg_freespacemap to get > some idea. Hm, I think I've already read quite a bit about the internals of the PG b-tree index implementation but still cannot get to the answer how I could influence that on my end as I want to stay compatible with the standard UUID data storage but need time sorting support. Anyway, I've made a bit of progress in testing and now have the full tests executing unattended with the help of a script: https://github.com/ancoron/pg-uuid-test I've uploaded one of the test run results here: https://gist.github.com/ancoron/d5114b0907e8974b6808077e02f8d109 After the first mass deletion, I can now see quite some savings for both, serial and for my new time-sorted index: table_name | bloat | index_mb | table_mb -------------+-----------------+----------+---------- uuid_v1 | 1500 MiB (48 %) | 3106.406 | 3378.383 uuid_serial | 800 MiB (33 %) | 2406.453 | 3378.383 uuid_v1_ext | 800 MiB (33 %) | 2406.453 | 3378.383 ...but in a second case (DELETE old + INSERT new), the savings are gone again in both cases: table_name | bloat | index_mb | table_mb -------------+-----------------+----------+---------- uuid_v1 | 1547 MiB (49 %) | 3153.859 | 3378.383 uuid_serial | 1402 MiB (47 %) | 3008.055 | 3378.383 uuid_v1_ext | 1403 MiB (47 %) | 3008.055 | 3378.383 So, the question for me would be: Is there any kind of data that plays optimal with space-savings in a rolling (e.g. last X rows) scenario? > >> 5.) REINDEX >> Table: uuid_v1 Time: 21549.860 ms (00:21.550) >> Table: uuid_v1_timestamp Time: 27367.817 ms (00:27.368) >> Table: uuid_seq Time: 19142.711 ms (00:19.143) >> Table: uuid_serial Time: 16889.807 ms (00:16.890) >> >> Even in this case it looks as if my implementation is faster than >> anything else - which I really don't get. > > Sorting already-sorted data is faster. CREATE INDEX is mostly a big > sort operation in the case of B-Tree indexes. Understood, this seems to be confirmed by my time-sorted index in the new tests: uuid_v1: 27632.660 ms (00:27.633) uuid_serial: 20519.363 ms (00:20.519) x1.35 uuid_v1_ext: 23846.474 ms (00:23.846) x1.16 > >> I might implement a different opclass for the standard UUID to enable >> time-wise index sort order. This will naturally be very close to >> physical order but I doubt that this is something I can tell PostgreSQL, or? > > PostgreSQL only knows whether or not your page splits occur in the > rightmost page in the index -- it fills the page differently according > to whether or not that is the case. > As I've implemented the new opclass and the new tests showing the results now, I think I can say that the time-sorting behavior as opposed to rather random really benefits the overall performance, which is what I actually care about most. Cheers, Ancoron