What would cause a VACUUM ANALYZE to acquire an ACCESS EXCLUSIVE lock (not running as a VACUUM FULL)?
Harold FalkmeyerOn Tue, Jun 6, 2017 at 11:43 AM, Harold Falkmeyer <hfalkmeyer@xxxxxxxxx> wrote:
I'd forgotten to mention that the VACUUM VERBOSE ANALYZEs were/are consistently completing their passes over indices. While the VACUUM ANALYZEs were disruptive, and looping as described above, the VERBOSE progress report was last indicating that "DETAIL: N dead row versions cannot be removed yet."HaroldOn Mon, Jun 5, 2017 at 7:18 PM, Harold Falkmeyer <hfalkmeyer@xxxxxxxxx> wrote:Thank you for your reply. Please find my comments inline below.Andres,On Mon, Jun 5, 2017 at 3:47 PM, Andres Freund <andres@xxxxxxxxxxx> wrote:Hi,
On 2017-06-05 14:20:12 -0700, Harold Falkmeyer wrote:
> My firm manages a somewhat large PostgreSQL database (400'ish tables, 5
> TiB'ish on disk, 200'ish GiB pg_dumps, etc.). We're presently running
> PostgreSQL 8.4.22 on Linux 2.6.32 with Slony-I 2.0.8 for replication.
First off, you should seriously consider upgrading. The situation has
gotten considerably better since 8.4, particularly in 9.6. Also the 8.4
branch hasn't received bugfixed (including security fixes) for nearly
three years.Agreed. We plan on upgrading to at least 9.4.9, if not 9.6.x within the next several months; however, we're not quite there yet.> Over the past several weeks, our weekly VACUUM ANALYZE job has become
> disruptive to other queries, seemingly spinning on one particular table
> (one of our largest in rows, columns, indexes, foreign keys, and general
> use), ultimately requiring that the job be terminated. With the number of
> INSERTs, UPDATEs, and DELETEs going through our system, it's imperative
> that VACUUM ANALYZEs be completed periodically. In the past, and given our
> usage patterns, we've opted for weekly VACUUM ANALYZEs in lieu of
> autovacuum, which is presently disabled.
That sounds like it might seriously exascerbate the problem - weekly is
really infrequent for both vacuum and analyze.Agreed. The intention for the new setup is to enable autovacuum.> After about 60-90 minutes working on the aforementioned table, and with the
> VACUUM ANALYZE at the top of our longest running queries list, other
> operations attempting to read from or write to on that table either
> complete extremely slowly or are awaiting locks held by the VACUUM
> ANALYZE. Sampling during the most recent disruptive period, the VACUUM
> ANALYZE pid held 20 locks: 17 RowExclusiveLocks, one AccessExclusiveLock,
> and one ShareUpdateExclusive, all on the aforementioned table, and one
> ExclusiveLock for the containing transaction.
An ExclusiveLock on the transaction is completely normal, and only used
for other sessions to be able to wait for the completion of the the
holder's transaction.
It does not entirely sound like you're suffering from locking problems.
Are there any locks that are not granted (e.g. SELECT * FROM pg_locks
WHERE NOT granted;) during that period? If so, what commands are
using these locks, you should be able to do normal table manipulations
while vacuum is in progress.Many other processes were listed as having ungranted locks, including many SELECT statements having ungranted AccessShareLocks for the affected relation. In fact, the ONLY granted locks for the affected relation were the AccessExclusiveLock and ShareUpdateExclusive locks held by the VACUUM ANALYZE. Under what circumstances does VACUUM ANALYZE attempt to acquire an AccessExclusiveLock?> After sending SIGTERMs to
> all other pids enumerated in pg_locks as holding or wanting locks on that
> table, the VACUUM ANALYZE continued without apparent progress. At the
> time, the containing process was consuming a rough average of about 6% of
> one CPU and 4.9% memory (no swapping), was holding 641 descriptors, was in
> a seeming loop of lseek, read, memcpy, and memcmp, with occasional semop
> calls. During this process, we attempted to SIGTERM all other connections
> to ensure no stale transactions, locks, etc. were preventing the VACUUM
> ANALYZE from finishing. Overall system load and I/O activity was quite low.
> Ultimately, we've felt forced to terminate the VACUUM ANALYZEs. After
> sending it a SIGTERM and after about 30 seconds of seeing no change, we
> attempted a service postgresql-8.4 stop. After 30 additional seconds, we
> sent a SIGQUIT to the VACUUM ANALYZE process (which we always try to
> avoid), which terminated it and, in turn, PostgreSQL stopped. Thereafter,
> the restarts seemed quick and uneventful, with cleanup happening without
> reported exception. Only once in the last month or so have we seen a
> VACUUM ANALYZE on this table complete.
Well, that'll make the problem worse and worse. I suspect your setting
might not be appropriately aggressive to be able to collect the
accumulated cruft. What's your maintenance_work_mem setting, and what
are your vacuum_cost_delay/vacuum_cost_limit settings? In what way do the SIGQUITs aggravate problems? I was under the impression that any residual artifacts left behind by a SIGQUIT (or immediate shutdown) are resolved when the database is next started!?Our maintenance_work_mem is 16 GiB, our vacuum_cost_delay is 0 ms, and our vacuum_cost_limit is 200. Host memory is 126 GiB. Perhaps a significant increase to maintenance_work_mem is warranted (and safe, given that we don't presently have the risk of multiple autovacuum workers). Do you have general rule-of-thumb recommendations for these settings (e.g., relative to other settings)?Are there other sources I should be querying for useful information? For example, I understand that pg_locks doesn't show all PostgreSQL locks that could cause interference (e.g., as mentioned in http://rhaas.blogspot.com/2011/03/troubleshooting-stuck-vacu !?ums.html) Greetings,
Andres Freund