RE: autovacuum locking question

Mike Schanne <mschanne@xxxxxxx> · Fri, 6 Dec 2019 15:55:32 +0000

The error is not actually showing up very often (I have 8 occurrences from 11/29 and none since then).  So maybe I should not be concerned about it.  I suspect
 we have an I/O bottleneck from other logs (i.e. long checkpoint sync times), so this error may be a symptom rather than the cause.

From: Jeff Janes [mailto:jeff.janes@xxxxxxxxx]

Sent: Thursday, December 05, 2019 6:55 PM

To: Mike Schanne

Cc: pgsql-performance@xxxxxxxxxxxxxx

Subject: Re: autovacuum locking question

On Thu, Dec 5, 2019 at 5:26 PM Mike Schanne <mschanne@xxxxxxx> wrote:

Hi,
I am investigating a performance problem in our application and am seeing something unexpected in the postgres logs regarding the autovacuum.

2019-12-01 13:05:39.029 UTC,"wb","postgres",6966,"127.0.0.1:53976",5ddbd990.1b36,17099,"INSERT waiting",2019-11-25 13:39:28
 UTC,12/1884256,12615023,LOG,00000,"process 6966 still waiting for RowExclusiveLock on relation 32938 of database 32768 after 1000.085 ms","Process holding the lock: 6045. Wait queue: 6966.",,,,,"INSERT INTO myschema.mytable (...) VALUES (...) RETURNING process.mytable.mytable_id",13,,""
2019-12-01 13:05:39.458 UTC,,,6045,,5de3b800.179d,1,,2019-12-01 12:54:24 UTC,10/417900,0,ERROR,57014,"canceling autovacuum task",,,,,"automatic vacuum of table ""postgres.myschema.mytable""",,,,""

My understanding from reading the documentation was that a vacuum can run concurrently with table inserts/updates, but from reading the logs it appears they are conflicting over
 a row lock.  This particular table gets very frequent inserts/updates (10-100 inserts / sec) so I am concerned that if the autovacuum is constantly canceled, then the table never gets cleaned and its performance will continue to degrade over time.  Is it expected
 for the vacuum to be canceled by an insert in this way?

We are using postgres 9.6.10.

If the vacuum finds a lot of empty pages at the end of the table, it will try to truncate them and takes a strong lock to do so.  It is supposed to check every 20ms to see if anyone else is blocked on that lock, at which point it stops
 doing the truncation and releases the lock.  So it should never get "caught" holding the lock in order to be cancelled.  Is your setting for deadlock_timeout much lower than usual?  Also, if the truncation is bogged down in very slow IO, perhaps it doesn't
 actually get around to checking ever 20ms despite its intentionsl

How often have you seen it in the logs?

Cheers,

Jeff

This email is non-binding, is subject to contract, and neither Kulicke and Soffa Industries, Inc. nor its subsidiaries (each and collectively “K&S”) shall have any obligation to you to consummate the transactions herein or to enter into any agreement, other
 than in accordance with the terms and conditions of a definitive agreement if and when negotiated, finalized and executed between the parties. This email and all its contents are protected by International and United States copyright laws. Any reproduction
 or use of all or any part of this email without the express written consent of K&S is prohibited.