Re: Does pg_stat_get_live_tuples() matter?

Sherrylyn Branchaw <sbranchaw@xxxxxxxxx> · Wed, 10 Apr 2019 11:50:30 -0400

> Hmmm ... what was in reltuples again?
Reltuples had the correct number, and the query plans were showing the correct estimates.

> it's a bit hard to believe that it could get to be off by 1000X.  Have you suppressed autovacuum on this table?

No, but here are some things I've observed: 

1)  Any time pg_stat_reset() gets run, pg_stat_user_tables.n_live_tup gets reset to 0. pg_class.reltuples is untouched. 

2) If new tuples get inserted or deleted after pg_stat_reset(), pg_stat_user_tables.n_live_tup will match pg_stat_user_tables.n_tup_ins - pg_stat_user_tables.n_tup_del up until the next analyze or autoanalyze.

3) Once the next (auto)analyze happens, pg_stat_user_tables.n_live_tup will be updated to match pg_class.reltuples.

4) If a table is very large, it may be a while before autovacuum_analyze_scale_factor is reached. We have ours set to 0.01, which is an order of magnitude larger than 1.8K on a 1.8M row table. (I would like to tune our settings more, but I have a list of priorities from management on which this isn't high.)

5) Given 1-4, pg_stat_user_tables.n_live_tup may spend quite a long time matching pg_stat_user_tables.n_tup_ins - pg_stat_user_tables.n_tup_del instead of pg_class.reltuples. For example, if a table has 1.8 million rows, and you insert 5 and delete 4 after a stats reset, n_live_tup will report that the table has 1 tuple.

6)  Failovers, at least in Aurora, apparently cause pg_stat_reset() to be run, at least judging by the timestamp I'm seeing in pg_stat_bgwriter.stat_reset. We haven't done a failover in the data center in a while, and it's less trivial for me to test there atm, so I'm not certain whether open-source Postgres failovers also reset statistics.

> I don't see anything in the current core code that pays attention to n_live_tuples.  reltuples definitely does matter to the planner, and some of the sibling counters like n_dead_tuples drive autovacuum, but nothing is examining n_live_tuples AFAICS.

That's what I thought, but I wanted to make sure I wasn't missing anything obvious. Thanks!

> some of the sibling counters like n_dead_tuples drive autovacuum

So that's interesting. I knew that, but hadn't thought about the implications. If pg_stat_reset() is executed by failovers, and a failover happens just before a table is ready to be vacuumed--say it has 0.009 * reltuples dead tuples (I'm simplifying the formula here)--then n_dead_tup gets reset to 0 and the counting starts all over again. Regular failovers could thus increase bloat by delaying the autovacuum daemon from recognizing that a table needs to be vacuumed, am I right?

Is it recommended practice to manually VACUUM ANALYZE the whole database after a failover? Or is resetting stats after a failover just an Aurora thing? I'm sorry I'm asking the latter question instead of testing, but I've been ordered not to spend time on improving our vacuuming and statistics until 5 other large projects are done, and I'm spending a minimal amount of time anyway just to see how badly frequent failovers might be affecting us and if there's any action we need to take.

Thanks,
Sherrylyn