Re: index bloat estimation

Keith Fiske <keith.fiske@xxxxxxxxxxxxxxx> · Tue, 16 Feb 2021 19:06:39 -0500

On Sun, Feb 14, 2021 at 11:43 PM Victor Sudakov <vas@xxxxxxxxxx> wrote:
Keith Fiske wrote:

> >

> > What queries do you use to estimate index and table bloat?

[dd]

> Why estimate when you can get the exact amount? At least for b-tree indexes

> anyway.

> 

> https://github.com/keithf4/pg_bloat_check

Hello Keith,

Thanks for this script and for reminding about pgstattuple.

The script has the drawback of requiring r/w access to the database, so

it cannot be run on a replica. But the idea is excellent. At least it

can show the top N bloated relations, and then I can further explore with

pgstattuple(), pgstatindex() etc.

-- 

Victor Sudakov,  VAS4-RIPE, VAS47-RIPN

2:5005/49@fidonet http://vas.tomsk.ru/

You cannot run the actual scan on the replica, no. But it can be set to run against the replica and just report the statistics so you can have a cron set up to always run on the given system in case of failover. The "--recovery_mode_norun" can be set so it will only run if the target system is actually a primary.
I could possibly see about letting this actually run against the replica, however this can be a rather long running transaction depending on the size of the tables involved. You can set the "--commit_rate" to avoid some of that, but if you have really large tables, it can still run quite long. So this sort of check is really best run against the primary to avoid issues around having to allow long running queries on the replica (delayed replication or even worse bloat buildup).

However, once you get bloat under control, you likely shouldn't need to be running this often, especially against the entire database. If a few tables turn out to be problematic, you can make schedules just for them. And otherwise run an entire database scan at most maybe once a month during off-peak hours.

-- 
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com