Re: Large pkey index on insert-only table

Peter Geoghegan <pg@xxxxxxx> · Mon, 26 Jun 2023 10:59:00 -0700

On Mon, Jun 26, 2023 at 8:50 AM Devin Ivy <devinivy@xxxxxxxxx> wrote:
> Any thoughts on why this may be, or where to go next to continue tracking this down?  Also, could the primary key column order `(id, ancestor_id)` vs `(ancestor_id, id)` significantly affect the index size depending on the column cardinalities?

That is certainly possible, yes. I describe one particular pathology
that causes composite B-Tree indexes to only have about 50% space
utilization here:

https://youtu.be/p5RaATILoiE?t=2079

Theoretically this shouldn't be a problem anymore, because (as the
talk describes) Postgres 12 added heuristics that avoid the problem.
But those heuristics are kind of conservative; they only kick in when
it's fairly clearly the right thing to do. I don't think that they'll
work very reliably for varchar columns.

Note that 90% space utilization isn't really the standard case for
B-Trees in general. Even absent updates and deletes, an index with
completely random insertions (e.g., a UUID index) is expected to have
about 70% space utilization. You can only really expect ~90% space
utilization with monotonically increasing insertions.

On the other hand having less than 50% space utilization is pretty
poor, so (assuming that that's what "bloat percentage 47%" means) then
I'd say that you're right to suspect that something is a bit off here.
This isn't necessarily a big deal, but I tend to agree that what
you're seeing is something that theoretically can be avoided by the
implementation (if there were deletes involved then that wouldn't
apply, but there aren't).

--
Peter Geoghegan