On Sat, Nov 2, 2019 at 8:52 PM Konstantin Gredeskoul <kigster@xxxxxxxxx> wrote:
A related question is — how can you avoid doing auto vacuum during the busy peak hours, and do more aggressive auto vacuuming at night during low traffic? Is that possible?I tried achieving that with a manual vacuum, but that causes all replicas to get far behind while it’s running, which is also not acceptable.
What is the main concern here? Disk IO? So we want to allow autovacuum workers to hit our disks less during busy hours and more during quiet hours, right?
If so, then one of the ideas here could be the changing quotas for the autovacuum workers dynamically using cron or its alternative:
1) throttle autovacuum before peak hours, changing autovacuum_vacuum_cost_delay (or autovacuum_vacuum_cost_limit). The pre-12 default settings (_cost_delay 20ms and cost_limit 200) will give all the workers the shared "quota" that means, roughly, 8 MiB/s for reads, less for writes (good article: https://www.2ndquadrant.com/en/blog/autovacuum-tuning-basics/), and in 12 (where _cost_delay was reduced to 2ms) it will be ~80 MiB/s. Monitoring + logs analysis is needed to understand real IO though. Individually tuned tables will have individual quotas.
2) change those settings back to normal after busy hours.
In any case, I highly recommend to monitor "autovacuum queue" -- the list of tables that should be already autovacuumed, but they are not, due to some reasons. Having proper graphs in monitoring and alerts if the size of this "queue" exceeds some reasonable number (say, 2 * autovacuum_max_workers) will help a lot to understand the autovacuum behavior for the concrete system. Here is an excellent example of how to do it, from Avito: https://github.com/avito-tech/dba-utils/blob/master/munin/vacuum_queue, and here is my version, with some improvements: https://gitlab.com/snippets/1889668.