Re: autovacuum big table taking hours and sometimes seconds

Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> · Thu, 14 Feb 2019 18:09:04 +0200

I checked in the logs when the autovacuum vacuum my big toasted table during the week and I wanted to confirm with you what I think : 
postgresql-Fri.log:2019-02-08 05:05:53 EST  24776  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 8
postgresql-Fri.log-	pages: 2253 removed, 13737828 remain
postgresql-Fri.log-	tuples: 21759258 removed, 27324090 remain
postgresql-Fri.log-	buffer usage: 15031267 hits, 21081633 misses, 19274530 dirtied
postgresql-Fri.log-	avg read rate: 2.700 MiB/s, avg write rate: 2.469 MiB/s
--
postgresql-Mon.log:2019-02-11 01:11:46 EST  8426  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 23
postgresql-Mon.log-	pages: 0 removed, 23176876 remain
postgresql-Mon.log-	tuples: 62269200 removed, 82958 remain
postgresql-Mon.log-	buffer usage: 28290538 hits, 46323736 misses, 38950869 dirtied
postgresql-Mon.log-	avg read rate: 2.850 MiB/s, avg write rate: 2.396 MiB/s
--
postgresql-Mon.log:2019-02-11 21:43:19 EST  24323  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 1
postgresql-Mon.log-	pages: 0 removed, 23176876 remain
postgresql-Mon.log-	tuples: 114573 removed, 57785 remain
postgresql-Mon.log-	buffer usage: 15877931 hits, 15972119 misses, 15626466 dirtied
postgresql-Mon.log-	avg read rate: 2.525 MiB/s, avg write rate: 2.470 MiB/s
--
postgresql-Sat.log:2019-02-09 04:54:50 EST  1793  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 13
postgresql-Sat.log-	pages: 0 removed, 13737828 remain
postgresql-Sat.log-	tuples: 34457593 removed, 15871942 remain
postgresql-Sat.log-	buffer usage: 15552642 hits, 26130334 misses, 22473776 dirtied
postgresql-Sat.log-	avg read rate: 2.802 MiB/s, avg write rate: 2.410 MiB/s
--
postgresql-Thu.log:2019-02-07 12:08:50 EST  29630  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 13
postgresql-Thu.log-	pages: 0 removed, 10290976 remain
postgresql-Thu.log-	tuples: 35357057 removed, 3436237 remain
postgresql-Thu.log-	buffer usage: 11854053 hits, 21346342 misses, 19232835 dirtied
postgresql-Thu.log-	avg read rate: 2.705 MiB/s, avg write rate: 2.437 MiB/s
--
postgresql-Tue.log:2019-02-12 20:54:44 EST  21464  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 10
postgresql-Tue.log-	pages: 0 removed, 23176876 remain
postgresql-Tue.log-	tuples: 26011446 removed, 49426774 remain
postgresql-Tue.log-	buffer usage: 21863057 hits, 28668178 misses, 25472137 dirtied
postgresql-Tue.log-	avg read rate: 2.684 MiB/s, avg write rate: 2.385 MiB/s
--

Lets focus for example on one of the outputs :
postgresql-Fri.log:2019-02-08 05:05:53 EST  24776  LOG:  automatic vacuum of table "myDB.pg_toast.pg_toast_1958391": index scans: 8
postgresql-Fri.log-	pages: 2253 removed, 13737828 remain
postgresql-Fri.log-	tuples: 21759258 removed, 27324090 remain
postgresql-Fri.log-	buffer usage: 15031267 hits, 21081633 misses, 19274530 dirtied
postgresql-Fri.log-	avg read rate: 2.700 MiB/s, avg write rate: 2.469 MiB/s

The cost_limit is set to 200 (default) and the cost_delay is set to 20ms. 
The calculation I did : (1*15031267+10*21081633+20*19274530)/200*20/1000 = 61133.8197 seconds ~ 17H
So autovacuum was laying down for 17h ? I think that I should increase the cost_limit to max specifically on the toasted table. What do you think ? Am I wrong here ?

‫בתאריך יום ה׳, 7 בפבר׳ 2019 ב-18:26 מאת ‪Jeff Janes‬‏ <‪jeff.janes@xxxxxxxxx‬‏>:‬
On Thu, Feb 7, 2019 at 6:55 AM Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> wrote:

I have 3 questions : 
1)To what value do you recommend to increase the vacuum cost_limit ? 2000 seems reasonable ? Or maybe its better to leave it as default and assign a specific value for big tables ?

That depends on your IO hardware, and your workload.  You wouldn't want background vacuum to use so much of your available IO that it starves your other processes.

2)When the autovacuum reaches the cost_limit while trying to vacuum a specific table, it wait nap_time seconds and then it continue to work on the same table ? 

No, it waits for autovacuum_vacuum_cost_delay before resuming within the same table. During this delay, the table is still open and it still holds a lock on it, and holds the transaction open, etc.  Naptime is entirely different, it controls how often the vacuum scheduler checks to see which tables need to be vacuumed again.

3)So in case I have a table that keeps growing (not fast because I set the vacuum_scale_factor to 0 and the autovacuum_vacuum_threshold to 10000). If the table keep growing it means I should try to increase the cost right ? Do you see any other option ?

 You can use pg_freespacemap to see if the free space is spread evenly throughout the table, or clustered together.  That might help figure out what is going on.  And, is it the table itself that is growing, or the index on it?

Cheers,

Jeff