Thanks Jeff for the response--I did end up just analyzing the tables manually, as a stopgap. Resource consumption was a non-issue as you predicted (and plan was corrected, though estimates were still slightly awkward).
With respect to the blocking of the autovacuum/analyze: no it shouldn't be the case that those are running frequently in our case. Ditto re: database restarts--this is out of my control because it would be DigitalOcean's doing, but I don't see any evidence of it. Nor anything amiss in \dt+, unfortunately.
I'll try to figure out if I can get access to the logs to search for cancellations. Do you happen to know what that would look like if I'm grep-ing for it? And do you have any other guesses about possible explanations?
Thanks again for your help.
Lincoln
On Sun, Feb 23, 2025 at 6:09 PM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Sun, Feb 23, 2025 at 5:49 PM Lincoln Swaine-Moore <lswainemoore@xxxxxxxxx> wrote:Thanks for the reply! I tried the analysis on our much shorter staging table and it did change the plan. I haven’t tried it on the production ones because my understanding is that the autovacuum process is gentler with resource consumption and I didn’t want to gum things up in the meantime. But that may be false or avoidable.The default setting of vacuum_cost_delay is zero, while the default setting of autovacuum_vacuum_cost_delay is 2ms (or 20ms, depending on the software version). So you can give a manual run the same resource consumption as an autorun just by changing vacuum_cost_delay in that session so that it has the same value as autovacuum_vacuum_cost_delay. Also, if you just do ANALYZE (rather than VACUUM ANALYZE) the resource usage should be rather modest anyway.Another difference is that autovac will cancel itself if it detects it is blocking something else, while a manual vac/analyze operation will not do that. Normal operations (DML) don't block against vacuum anyway, only things like index creation or partition maintenance do that. But if those types of operation are frequent, then doing a manual VACUUM or ANALYZE could indeed gum things up. Also, if those operations are frequent, it could explain the missing autovac. If every auto attempt gets cancelled before it completes, then it will never complete. pg_stat_user_tables doesn't reflect cancelled vacuum or analyze so those will go missing. (You should see mentions of cancelled autovac operations in the log file though.)Database restarts will also interrupt vacuums. So if your database is shutdown and restarted regularly (for cold back-ups, or just out of some misguided belief that restarting occasionally is a good practise) and the period between restarts is shorter than how long it would take autovac to run, this could also explain the lack of completed autovacs. Also, if a table qualifies for both auto vacuum and auto analyze, the vacuum is done first. So even if auto analyze would be fast by itself, it still won't complete if auto vacuum is slow and never gets to finish.It is possible to override your vacuum settings on a per-table basis. So another possible explanation for the missing autovacs is that those two tables have been specifically configured to disable autovacuum on them and only them. If you use psql, \dt+ (but not regular \dt) will show such customizations. I'm sure other tools also have ways to detect this, but I don't know what those ways are off the top of my head.Cheers,Jeff
Lincoln Swaine-Moore