Re: Unfortunate Nested Loop + Missing Autovacuum

Andrei Lepikhov <lepihov@xxxxxxxxx> · Mon, 24 Feb 2025 15:01:16 +0100

On 23/2/2025 23:49, Lincoln Swaine-Moore wrote:
Thanks for the reply! I tried the analysis on our much shorter staging 
table and it did change the plan. I haven’t tried it on the production 
ones because my understanding is that the autovacuum process is gentler 
with resource consumption and I didn’t want to gum things up in the 
meantime. But that may be false or avoidable.

Unfortunately the insertions are not batched—they arrive in a fairly 
steady stream. Given that, and the fact that we’re talking about the 
behavior of the daemon, I’m not sure how to provide a reproduction 
(though would be game if you had pointers). I could in theory make a job 
analyze these tables on a regular cadence, though I guess that feels 
like I’d be stepping on autovacuum’s toes.

Would be happy to read over similar reports if you could steer me toward 
them.
I know about at least two problematic use cases:
1. Batch insertions of tuples with steadily growing value of a column 
(timestamp as an example). Selecting "just arrived" tuples, we can't 
estimate cardinalities correctly because these values are out of 
histogram and MCV statistics until the next analyze.
2. The table is overcrowded by queries, and autovacuum does not have a 
chance to pass the table.

For case No.1, I attempted to extend the 'probe indexes' technique [1]. 
However, it is applicable only in specific conditions (analytic load, 
for example) and should be designed as an extension. Unfortunately, we 
still do not have selectivity hooks to let it go.

To resolve case No.2 (and not only), I have invented the 'plan freezing' 
extension [2,3]. Using it, you may find the optimal plan once and freeze 
it in the instance's plan cache, picking constants that have to be 
treated as parameters. Any changes of the table content, statistics, or 
even GUCs will not impact the query plan until "unfreezing".

Also, you may use the pg_hint_plan extension [4] to provide Postgres 
optimiser with hints that can help it generate the plan you need.

[1] https://danolivo.substack.com/p/probing-indexes-to-survive-data-skew
[2] https://danolivo.substack.com/p/designing-a-prototype-postgres-plan
[3] https://postgrespro.com/docs/enterprise/16/sr-plan
[4] https://github.com/ossc-db/pg_hint_plan

--
regards, Andrei Lepikhov