Re: index bloat estimation

Gregory Smith <gregsmithpgsql@xxxxxxxxx> · Sun, 14 Feb 2021 05:36:47 -0500 (EST)

On Fri, 12 Feb 2021, Victor Sudakov wrote:

Most of the stuff I've looked at is pretty old, much seems unsupported.
What is the current best practice?

None of the estimate queries ever really worked well.  They just gave a 
bit more information than zero in the days before PG's internal functions 
were really reliable and useful for bloat measurement, something that 
happened in version 9.5 development.

For any modern PG, it's worth the trouble to learn how to directly use 
pgstattuple https://www.postgresql.org/docs/current/pgstattuple.html to do 
this job.  You run and interpret the output from pgstattuple(relation) and 
its faster estimate version pgstattuple_approx.  It's not hard to run some 
simulations with deleted rows to see what bloat looks like when it builds 
up.

There's still some need for manual estimates if you want to account for 
fillfactor in all cases, but I see that as a niche topic, not where people 
should start at.

There are also wrapper scripts built on top of pgstattuple around, like 
the already mentioned https://github.com/keithf4/pg_bloat_check

A good bit of the work done in that script is around handling multiple 
versions of PG and building some long-term idea of bloat state on all 
tables.  Workloads that have a bloat problem are sometimes fixed, but in a 
lot of cases the best you can do is monitor them and rebuild things when 
it gets bad.  That's one context Keith's packaging of this feature aims 
at.

--
Greg Smith  greg.smith@xxxxxxxxxxxxxxx
Director of Open Source Strategy
Crunchy Data https://www.crunchydata.com/