Re: UUID v1 optimizations...

Morris de Oryx <morrisdeoryx@xxxxxxxxx> · Sun, 26 May 2019 14:27:05 +1000

I'm not worthy to post here, but a bit of a random thought.

If I've followed the conversation correctly, the reason for a V1 UUID is partly to order and partition rows by a timestamp value, but without the cost of a timestamp column. As I was told as a boy, "Smart numbers aren't." Is it _absolutely_ the case that you can't afford another column? I don't know the ins and outs of the Postgres row format, but my impression is that it's a fixed size, so you may be able to add the column without splitting rows? Anyway, even if that's not true and the extra column costs you disk space, is it the index that concerns you?  Have you considered a timestamp column, or a numeric column with an epoch offset, and a BRIN index? If you insert data is in pretty much chronological order, that might work well for you.

Best of luck, I've enjoyed following the commentary.

On Sun, May 26, 2019 at 11:09 AM Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> wrote:
On Sun, May 26, 2019 at 01:49:30AM +0200, Ancoron Luciferis wrote:

>On 26/05/2019 00:14, Tomas Vondra wrote:

>> On Sat, May 25, 2019 at 05:54:15PM -0400, Tom Lane wrote:

>>> Ancoron Luciferis <ancoron.luciferis@xxxxxxxxxxxxxx> writes:

>>>> On 25/05/2019 16:57, Tom Lane wrote:

>>>>> (4) it in fact *wouldn't* do anything useful, because we'd still have

>>>>> to sort UUIDs in the same order as today, meaning that btree index

>>>>> behavior

>>>>> would remain the same as before.  Plus UUID comparison would get a lot

>>>>> more complicated and slower than it is now.

>>>

>>>> I get your first sentence, but not your second. I know that when

>>>> changing the internal byte order we'd have to completed re-compute

>>>> everything on-disk (from table to index data), but why would the sorting

>>>> in the index have to be the same?

>>>

>>> Because we aren't going to change the existing sort order of UUIDs.

>>> We have no idea what applications might be dependent on that.

>>>

>>> As Vitalii correctly pointed out, your beef is not with the physical

>>> storage of UUIDs anyway: you just wish they'd sort differently, since

>>> that is what determines the behavior of a btree index.  But we aren't

>>> going to change the sort ordering because that's an even bigger

>>> compatibility break than changing the physical storage; it'd affect

>>> application-visible semantics.

>>>

>>> What you might want to think about is creating a function that maps

>>> UUIDs into an ordering that makes sense to you, and then creating

>>> a unique index over that function instead of the raw UUIDs.  That

>>> would give the results you want without having to negotiate with the

>>> rest of the world about whether it's okay to change the semantics

>>> of type uuid.

>>>

>>

>> FWIW that's essentially what I implemented as an extension some time

>> ago. See [1] for a more detailed explanation and some benchmarks.

>

>Yes, I've seen that before. Pretty nice work you but together there and

>I'll surely have a look at it but we certainly need the node id in

>compliance with v1 UUID's so that's why we've been generating UUID's at

>the application side from day 1.

>

>>

>> The thing is - it's not really desirable to get perfectly ordered

>> ordering, because that would mean we never get back to older parts of

>> the index (so if you delete data, we'd never fill that space).

>

>Wouldn't this apply also to any sequential-looking index (e.g. on

>serial)?

Yes, it does apply to any index on sequential (ordered) data. If you

delete data from the "old" part (but not all, so the pages don't get

completely empty), that space is lost. It's available for new data, but

if we only insert to "new" part of the index, that's useless.

> The main issue with the UUID's is that it almost instantly

>consumes a big part of the total value space (e.g. first value is

>'01...' and second is coming as 'f3...') which I would assume not being

>very efficient with btrees (space reclaim? - bloat).

>

I don't understand what you mean here. Perhaps you misunderstand how

btree indexes grow? It's not like we allocate separate pages for

different values/prefixes - we insert the data until a page gets full,

then it's split in half. There is some dependency on the order in which

the values are inserted, but AFAIK random order is generally fine.

>One of our major concerns is to keep index size small (VACUUM can't be

>run every minute) to fit into memory next to a lot of others.

>

I don't think this has much to do with vacuum - I don't see how it's

related to the ordering of generated UUID values. And I don't see where

the "can't run vacuum every minute" comes from.

>I've experimented with the rollover "prefix" myself but found that it

>makes the index too big (same or larger index size than standard v1

>UUIDs) and VACUUM too slow (almost as slow as a standard V1 UUID),

>although INSERT performance wasn't that bad, our sequential UUID's where

>way faster (at least pre-generated and imported with COPY to eliminate

>any value generation impact).

>

I very much doubt that has anything to do with the prefix. You'll need

to share more details about how you did your tests.

regards

-- 

Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services