Re: Index bloat and REINDEX/VACUUM optimization for partial index

Jeff Janes <jeff.janes@xxxxxxxxx> · Thu, 31 Aug 2023 23:18:12 -0400

On Thu, Aug 31, 2023 at 11:06 AM Maxim Boguk <maxim.boguk@xxxxxxxxx> wrote:

With the default value of autovacuum_vacuum_scale_factor (The default is 0.2 (20% of table size).) index will collect like 100M outdated/dead index entries before autovacuum kicks in and cleans them all (in a worst case),  and of course it will lead to huge index bloat and awful performance.

Index bloat doesn't automatically lead to awful performance.  There must be some additional factor at play.

Even if you scale down autovacuum_vacuum_scale_factor to some unreasonable low value like 0.01, the index still bloats to the 5M dead entries before autovacuum run, and constant vacuuming of a huge 500M table will put a huge load on the database server.

For this type of situation, I would generally set autovacuum_vacuum_scale_factor to 0, and use autovacuum_vacuum_threshold to drive the vacuuming instead.  But I'd make those changes just on the queue table(s), not system wide.  Due to the visibility map, the load on the server does not need to be huge just due to the table, as the stable part of the table can be ignored.  The problem is that each index still needs to be read entirely for each vacuum cycle, which would not be much of a problem for the partial indexes, but certainly could be for the full indexes.  There are some very recent improvements in this area, but I don't think they can be applied selectively to specific indexes.

Unfortunately there is no easy way out of this situation from database side, in general I recommend not trying to implement a fast pacing queue like load inside of a huge and constantly growing table, it never works well because you cannot keep up partial efficient indexes for the queue in a clean/non-bloated state.

In my opinion the best solution is to keep list of entries to process ("around 1000-1500 tasks in pending statuses") duplicated in the separate tiny table (via triggers or implement it on the application level), in that case autovacuum will be able quickly clean dead entries from the index.

You should be able to use declarative partitioning to separate the "final" tuples from the "active" tuples, to get the same benefit but with less work.

Cheers,

Jeff