Re: autovacuum big table taking hours and sometimes seconds

Michael Lewis <mlewis@xxxxxxxxxxx> · Thu, 14 Feb 2019 14:45:10 -0700

If there are high number of updates during normal daytime processes, then yes you need to ensure autovacuum is handling this table as needed. If the nightly delete is the only major source of bloat on this table, then perhaps running a manual vacuum keeps things tidy after the big delete. Granted, if you are manually going to vacuum, don't use vacuum full as there is not much sense in recovering that disk space if the table is going to expected to be similarly sized again by the end of the day.

Do you have a proper number of workers and maintenance_work_mem to get the job done?

As you proposed, it seems likely to be good to significantly increase autovacuum_vacuum_cost_limit on this table, and perhaps decrease autovacuum_vacuum_scale_factor if it is not being picked up as a candidate for vacuum very frequently.

Michael Lewis 

On Thu, Feb 14, 2019 at 1:08 PM Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> wrote:
No I don't run vacuum manually afterwards because the autovacuum should run. This process happens every night. Yes , bloating is an issue because the table grow and take a lot of space on disk.  Regarding the autovacuum, I think that it sleeps too much time (17h) during it's work, don't you think so? 

On Thu, Feb 14, 2019, 9:52 PM Michael Lewis <mlewis@xxxxxxxxxxx wrote:
Thanks, that context is very enlightening. Do you manually vacuum after doing the big purge of old session data? Is bloat causing issues for you? Why is it a concern that autovacuum's behavior varies?

Michael Lewis

On Thu, Feb 14, 2019 at 12:41 PM Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> wrote:
Maybe by explaining the tables purpose it will be cleaner. The original table contains rows for sessions in my app. Every session saves for itself some raw data which is saved in the toasted table. We clean old sessions (3+ days) every night. During the day sessions are created so the size of the table should grow during the day and freed in the night after the autovacuum run.However, the autovacuums sleeps for alot of time and during that time more sessions are created so maybe this can explain the big size ? Do you think that by increasing the cost limit and decreasing the cost delay I can solve the issue ?

On Thu, Feb 14, 2019, 8:38 PM Michael Lewis <mlewis@xxxxxxxxxxx wrote:
It is curious to me that the tuples remaining count varies so wildly. Is this expected?

Michael Lewis

On Thu, Feb 14, 2019 at 9:09 AM Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> wrote:
I checked in the logs when the autovacuum vacuum my big toasted table during the week and I wanted to confirm with you what I think : 
postgresql-Fri.log:2019-02-08 05:05:53 EST  24776  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 8
postgresql-Fri.log-	pages: 2253 removed, 13737828 remain
postgresql-Fri.log-	tuples: 21759258 removed, 27324090 remain
postgresql-Fri.log-	buffer usage: 15031267 hits, 21081633 misses, 19274530 dirtied
postgresql-Fri.log-	avg read rate: 2.700 MiB/s, avg write rate: 2.469 MiB/s
--
postgresql-Mon.log:2019-02-11 01:11:46 EST  8426  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 23
postgresql-Mon.log-	pages: 0 removed, 23176876 remain
postgresql-Mon.log-	tuples: 62269200 removed, 82958 remain
postgresql-Mon.log-	buffer usage: 28290538 hits, 46323736 misses, 38950869 dirtied
postgresql-Mon.log-	avg read rate: 2.850 MiB/s, avg write rate: 2.396 MiB/s
--
postgresql-Mon.log:2019-02-11 21:43:19 EST  24323  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 1
postgresql-Mon.log-	pages: 0 removed, 23176876 remain
postgresql-Mon.log-	tuples: 114573 removed, 57785 remain
postgresql-Mon.log-	buffer usage: 15877931 hits, 15972119 misses, 15626466 dirtied
postgresql-Mon.log-	avg read rate: 2.525 MiB/s, avg write rate: 2.470 MiB/s
--
postgresql-Sat.log:2019-02-09 04:54:50 EST  1793  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 13
postgresql-Sat.log-	pages: 0 removed, 13737828 remain
postgresql-Sat.log-	tuples: 34457593 removed, 15871942 remain
postgresql-Sat.log-	buffer usage: 15552642 hits, 26130334 misses, 22473776 dirtied
postgresql-Sat.log-	avg read rate: 2.802 MiB/s, avg write rate: 2.410 MiB/s
--
postgresql-Thu.log:2019-02-07 12:08:50 EST  29630  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 13
postgresql-Thu.log-	pages: 0 removed, 10290976 remain
postgresql-Thu.log-	tuples: 35357057 removed, 3436237 remain
postgresql-Thu.log-	buffer usage: 11854053 hits, 21346342 misses, 19232835 dirtied
postgresql-Thu.log-	avg read rate: 2.705 MiB/s, avg write rate: 2.437 MiB/s
--
postgresql-Tue.log:2019-02-12 20:54:44 EST  21464  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 10
postgresql-Tue.log-	pages: 0 removed, 23176876 remain
postgresql-Tue.log-	tuples: 26011446 removed, 49426774 remain
postgresql-Tue.log-	buffer usage: 21863057 hits, 28668178 misses, 25472137 dirtied
postgresql-Tue.log-	avg read rate: 2.684 MiB/s, avg write rate: 2.385 MiB/s
--

Lets focus for example on one of the outputs :
postgresql-Fri.log:2019-02-08 05:05:53 EST  24776  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 8
postgresql-Fri.log-	pages: 2253 removed, 13737828 remain
postgresql-Fri.log-	tuples: 21759258 removed, 27324090 remain
postgresql-Fri.log-	buffer usage: 15031267 hits, 21081633 misses, 19274530 dirtied
postgresql-Fri.log-	avg read rate: 2.700 MiB/s, avg write rate: 2.469 MiB/s

The cost_limit is set to 200 (default) and the cost_delay is set to 20ms. 
The calculation I did : (1*15031267+10*21081633+20*19274530)/200*20/1000 = 61133.8197 seconds ~ 17H
So autovacuum was laying down for 17h ? I think that I should increase the cost_limit to max specifically on the toasted table. What do you think ? Am I wrong here ?

‫בתאריך יום ה׳, 7 בפבר׳ 2019 ב-18:26 מאת ‪Jeff Janes‬‏ <‪jeff.janes@xxxxxxxxx‬‏>:‬
On Thu, Feb 7, 2019 at 6:55 AM Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> wrote:

I have 3 questions : 
1)To what value do you recommend to increase the vacuum cost_limit ? 2000 seems reasonable ? Or maybe its better to leave it as default and assign a specific value for big tables ?

That depends on your IO hardware, and your workload.  You wouldn't want background vacuum to use so much of your available IO that it starves your other processes.

2)When the autovacuum reaches the cost_limit while trying to vacuum a specific table, it wait nap_time seconds and then it continue to work on the same table ? 

No, it waits for autovacuum_vacuum_cost_delay before resuming within the same table. During this delay, the table is still open and it still holds a lock on it, and holds the transaction open, etc.  Naptime is entirely different, it controls how often the vacuum scheduler checks to see which tables need to be vacuumed again.

3)So in case I have a table that keeps growing (not fast because I set the vacuum_scale_factor to 0 and the autovacuum_vacuum_threshold to 10000). If the table keep growing it means I should try to increase the cost right ? Do you see any other option ?

 You can use pg_freespacemap to see if the free space is spread evenly throughout the table, or clustered together.  That might help figure out what is going on.  And, is it the table itself that is growing, or the index on it?

Cheers,

Jeff