I am not sure why do you want to change on-disk storage format? If we are talking about indexes, it's more about comparison function (opclass) that is used in an index.
Am I wrong?
сб, 25 трав. 2019 о 11:21 Ancoron Luciferis <ancoron.luciferis@xxxxxxxxxxxxxx> пише:
On 25/05/2019 16:57, Tom Lane wrote:
> Ancoron Luciferis <ancoron.luciferis@xxxxxxxxxxxxxx> writes:
>> So I investigated the PostgreSQL code to see how it is handling UUID's
>> with respect to storage, sorting, aso. but all I could find was that it
>> basically falls back to the 16-byte.
>
> Yup, they're just blobs to us.
>
>> After struggling to find a way to optimize things inside the database, I
>> reverted to introduce a hack into the application by not shuffling the
>> timestamp bytes for the UUID's, which makes it look quite serial in
>> terms of byte order.
>
>> So, my question now is: Would it make sense for you to handle these
>> time-based UUID's differently internally? Specifically un-shuffling the
>> timestamp before they are going to storage?
>
> No, because
>
> (1) UUID layout is standardized;
You mean the presentation at the byte-level is. ;)
>
> (2) such a change would break on-disk compatibility for existing
> databases;
Yes, that certainly is a show-stopper.
>
> (3) particularly for the case of application-generated UUIDs, we do
> not have enough information to know that this would actually do anything
> useful;
Well, not only the layout is standardized, but also there is a semantic
to it depending on the version. Specifically for version 1, it has:
1. a timestamp
2. a clock sequence
3. a node id
Ans as PostgreSQL already provides this pretty concrete data type, it
could be a natural consequence to also support the semantic of it.
E.g. the network types also come with a lot of additional operators and
functions. So I don't see a reason not to respect the additional
capabilities of a UUID.
For other versions of UUID's, functions like timestamp would certainly
not be available (return NULL?), respecting the concrete semantic.
>
> (4) it in fact *wouldn't* do anything useful, because we'd still have
> to sort UUIDs in the same order as today, meaning that btree index behavior
> would remain the same as before. Plus UUID comparison would get a lot
> more complicated and slower than it is now.
I get your first sentence, but not your second. I know that when
changing the internal byte order we'd have to completed re-compute
everything on-disk (from table to index data), but why would the sorting
in the index have to be the same?
And actually, comparison logic wouldn't need to be changed at all if the
byte order is "changed" when the UUID is read in when reading the
representation into the internal UUID's byte array.
Function:
string_to_uuid(const char *source, pg_uuid_t *uuid);
^^ here I would apply the change. And of course, reverse it for
generating the textual representation.
That would slow down writes a bit, but that shouldn't be the case
because index insertions are speed up even more.
But still, on-disk change is still a show-stopper, I guess.
>
> (5) even if we ignored all that and did it anyway, it would only help
> for version-1 UUIDs. The index performance issue would still remain for
> version-4 (random) UUIDs, which are if anything more common than v1.
Yes, I am aware that the changes might be of very limited gain. V4
UUID's are usually used for external identifiers. For internal ones,
they don't make sense to me (too long, issues with randomness/enthropie
under high load, ...). ;)
I just recently used these UUID's also together with a function for
TimescaleDB auto-partitioning to provide the timestamp for the
partitioning logic instead of the need for a separate timestamp column.
This is also one of the reasons why I was also asking for native support
functions to extract the timestamp. I am apparently not very good at C
so I am currently using Python and/or PgPLSQL for it, which is pretty slow.
>
>
> FWIW, I don't know what tool you're using to get those "bloat" numbers,
> but just because somebody calls it bloat doesn't mean that it is.
> The normal, steady-state load factor for a btree index is generally
> understood to be about 2/3rds, and that looks to be about what
> you're getting for the regular-UUID-format index. The fact that the
> serially-loaded index has nearly no dead space is because we hack the
> page split logic to make that happen --- but that is a hack, and it's
> not without downsides. It should *not* be taken to be an indication
> of what you can expect for any other insertion pattern.
OK, understood. I was actually a bit surprised by those numbers myself
as these "serial" UUID's still only have the timestamp bytes in
ascending order, the clock sequence and node is still pretty random (but
not inside a single transaction, which might help the hack).
>
> The insertion-speed aspect is a real problem, but the core of that problem
> is that use of any sort of standard-format UUID turns applications that
> might have had considerable locality of reference into applications that
> have none. If you can manage to keep your whole index in RAM that would
> not hurt too much, but as soon as it doesn't fit you have a problem.
> When your app has more or less predictable reference patterns it's best
> to design a unique key that matches that, instead of expecting that
> essentially-random keys will work well.
The system was configured to have more than enough space for the index
and table data to fit into memory, but I am not sure. How can I verify
that? An EXPLAIN on the INSERT apparently doesn't include index insertion.
>
> Or in short, hacking up the way your app generates UUIDs is exactly
> the right way to proceed here.
OK. Glad to hear that.
One last question, though:
Would it make sense to create a specialized UUID v1 type (e.g. with an
extension) that does the transformation and delegates for all other
things to the existing UUID type support?
>
> regards, tom lane
>