On 25/05/2019 21:00, Vitalii Tymchyshyn wrote: > I am not sure why do you want to change on-disk storage format? If we > are talking about indexes, it's more about comparison function (opclass) > that is used in an index. > Am I wrong? I don't "want" to change the on-disk format of the v1 UUID's but that seems to be the most efficient approach to me (if anyone would ever want to implement that) as not changing it would mean to introduce the unshuffling of the timestamp bytes in a lot of other places where it would decrease the performance instead of increasing it, not to speak of VACUUM, REINDEX, CLUSTER, .... One of my use-cases would also benefit a lot here when trying to find entries that have been created within a given time range. To do that (without an additional timestamp column), I'd end up with a full index scan to determine the rows to read as there is no correlation between those (timestamp and value sort order). For example I would like to query the DB with: WHERE id >= '7d9c4835-5378-11ec-0000-000000000000' AND id < '5cdcac78-5a9a-11ec-0000-000000000000' ...which is impossible atm. I have written myself a helper function that extracts the timestamp, but that is executed (of course) for each and every entry while filtering or increases (if used as an index expression) the write RTT by quite a lot (if you're dealing with peak-load style applications like we have): WHERE time_part_uuidv1(id) >= '2021-12-02 14:02:31.021778+00' AND time_part_uuidv1(id) < '2021-12-11 15:52:37.107212+00' So, although possible, it is not really an option for an application like we have (and we don't just want to throw hardware at it). I don't know if all that makes any sense to you or if I'm just missing a piece of the puzzle inside the PostgreSQL code base. If I do, please tell me. :) Cheers, Ancoron > > сб, 25 трав. 2019 о 11:21 Ancoron Luciferis > <ancoron.luciferis@xxxxxxxxxxxxxx > <mailto:ancoron.luciferis@xxxxxxxxxxxxxx>> пише: > > On 25/05/2019 16:57, Tom Lane wrote: > > Ancoron Luciferis <ancoron.luciferis@xxxxxxxxxxxxxx > <mailto:ancoron.luciferis@xxxxxxxxxxxxxx>> writes: > >> So I investigated the PostgreSQL code to see how it is handling > UUID's > >> with respect to storage, sorting, aso. but all I could find was > that it > >> basically falls back to the 16-byte. > > > > Yup, they're just blobs to us. > > > >> After struggling to find a way to optimize things inside the > database, I > >> reverted to introduce a hack into the application by not > shuffling the > >> timestamp bytes for the UUID's, which makes it look quite serial in > >> terms of byte order. > > > >> So, my question now is: Would it make sense for you to handle these > >> time-based UUID's differently internally? Specifically > un-shuffling the > >> timestamp before they are going to storage? > > > > No, because > > > > (1) UUID layout is standardized; > > You mean the presentation at the byte-level is. ;) > > > > > (2) such a change would break on-disk compatibility for existing > > databases; > > Yes, that certainly is a show-stopper. > > > > > (3) particularly for the case of application-generated UUIDs, we do > > not have enough information to know that this would actually do > anything > > useful; > > Well, not only the layout is standardized, but also there is a semantic > to it depending on the version. Specifically for version 1, it has: > 1. a timestamp > 2. a clock sequence > 3. a node id > > Ans as PostgreSQL already provides this pretty concrete data type, it > could be a natural consequence to also support the semantic of it. > > E.g. the network types also come with a lot of additional operators and > functions. So I don't see a reason not to respect the additional > capabilities of a UUID. > > For other versions of UUID's, functions like timestamp would certainly > not be available (return NULL?), respecting the concrete semantic. > > > > > (4) it in fact *wouldn't* do anything useful, because we'd still have > > to sort UUIDs in the same order as today, meaning that btree index > behavior > > would remain the same as before. Plus UUID comparison would get a lot > > more complicated and slower than it is now. > > I get your first sentence, but not your second. I know that when > changing the internal byte order we'd have to completed re-compute > everything on-disk (from table to index data), but why would the sorting > in the index have to be the same? > > And actually, comparison logic wouldn't need to be changed at all if the > byte order is "changed" when the UUID is read in when reading the > representation into the internal UUID's byte array. > > Function: > string_to_uuid(const char *source, pg_uuid_t *uuid); > > ^^ here I would apply the change. And of course, reverse it for > generating the textual representation. > > That would slow down writes a bit, but that shouldn't be the case > because index insertions are speed up even more. > > But still, on-disk change is still a show-stopper, I guess. > > > > > (5) even if we ignored all that and did it anyway, it would only help > > for version-1 UUIDs. The index performance issue would still > remain for > > version-4 (random) UUIDs, which are if anything more common than v1. > > Yes, I am aware that the changes might be of very limited gain. V4 > UUID's are usually used for external identifiers. For internal ones, > they don't make sense to me (too long, issues with randomness/enthropie > under high load, ...). ;) > > I just recently used these UUID's also together with a function for > TimescaleDB auto-partitioning to provide the timestamp for the > partitioning logic instead of the need for a separate timestamp column. > > This is also one of the reasons why I was also asking for native support > functions to extract the timestamp. I am apparently not very good at C > so I am currently using Python and/or PgPLSQL for it, which is > pretty slow. > > > > > > > FWIW, I don't know what tool you're using to get those "bloat" > numbers, > > but just because somebody calls it bloat doesn't mean that it is. > > The normal, steady-state load factor for a btree index is generally > > understood to be about 2/3rds, and that looks to be about what > > you're getting for the regular-UUID-format index. The fact that the > > serially-loaded index has nearly no dead space is because we hack the > > page split logic to make that happen --- but that is a hack, and it's > > not without downsides. It should *not* be taken to be an indication > > of what you can expect for any other insertion pattern. > > OK, understood. I was actually a bit surprised by those numbers myself > as these "serial" UUID's still only have the timestamp bytes in > ascending order, the clock sequence and node is still pretty random (but > not inside a single transaction, which might help the hack). > > > > > The insertion-speed aspect is a real problem, but the core of that > problem > > is that use of any sort of standard-format UUID turns applications > that > > might have had considerable locality of reference into > applications that > > have none. If you can manage to keep your whole index in RAM that > would > > not hurt too much, but as soon as it doesn't fit you have a problem. > > When your app has more or less predictable reference patterns it's > best > > to design a unique key that matches that, instead of expecting that > > essentially-random keys will work well. > > The system was configured to have more than enough space for the index > and table data to fit into memory, but I am not sure. How can I verify > that? An EXPLAIN on the INSERT apparently doesn't include index > insertion. > > > > > Or in short, hacking up the way your app generates UUIDs is exactly > > the right way to proceed here. > > OK. Glad to hear that. > > One last question, though: > Would it make sense to create a specialized UUID v1 type (e.g. with an > extension) that does the transformation and delegates for all other > things to the existing UUID type support? > > > > > regards, tom lane > > > >