Re: Long running backup preventing auto vacuum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrey,

One more thing to add is that, vacuum will not run for catalog tables because of long running pg_stop_backup query

Query:

datname        |  pid  | query_start                   | state  |     runtime     |                            query      

----------------------+-------+---------+------------------+-------------+-----------------+-------------------------------+-----------------------------

 postgres      | 15088 | 2022-07-01 05:56:22.729101+00 | active | 02:12:56.255367 | SELECT labelfile, spcmapfile, lsn FROM pg_stop_backup(false)


Example Vacuum:


<dbname>=# vacuum verbose analyze pg_database;

INFO:  vacuuming "pg_catalog.pg_database"

INFO:  "pg_database": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages

DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 996487475

There were 0 unused item pointers.

Skipped 0 pages due to buffer pins, 0 frozen pages.

0 pages are entirely empty.

CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

INFO:  analyzing "pg_catalog.pg_database"

INFO:  "pg_database": scanned 1 of 1 pages, containing 5 live rows and 0 dead rows; 5 rows in sample, 5 estimated total rows

VACUUM



Autovacuum from logs of previous issue (timestamp does not match because this is taken from last week logs):


022-06-26 00:00:21 UTC [32632]: [2-1] user=,db=,app=,client= HINT:  Close open transactions soon to avoid wraparound problems.

You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

2022-06-26 00:00:21 UTC [32632]: [3-1] user=,db=,app=,client= LOG:  automatic aggressive vacuum of table "postgres.pg_catalog.pg_statistic": index scans: 0

pages: 0 removed, 40 remain, 0 skipped due to pins, 0 skipped frozen

tuples: 0 removed, 1064 remain, 577 are dead but not yet removable, oldest xmin: 4200201112

buffer usage: 57 hits, 43 misses, 0 dirtied

avg read rate: 203.846 MB/s, avg write rate: 0.000 MB/s

system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

2022-06-26 00:00:21 UTC [32632]: [4-1] user=,db=,app=,client= WARNING:  oldest xmin is far in the past

2022-06-26 00:00:21 UTC [32632]: [5-1] user=,db=,app=,client= HINT:  Close open transactions soon to avoid wraparound problems.


Both xmin point to the long running pg_stop_backup. Not quite sure, but would this cause autovacuum to not kick-in on application tables?


Thanks,

Nikhil


On Fri, Jul 1, 2022 at 1:11 PM Nikhil Shetty <nikhil.dba04@xxxxxxxxx> wrote:
Hi Andrey,

./wal-g7 --version

wal-g version v1.1.2-rc 6af461f 2021.12.14_08:19:15 PostgreSQL


The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout
Do you think this will prevent the autovacuum from kicking in ? 

I think this tx can be safely removed
How can I remove this? I can test whether this is the problem 

Thanks,
Nikhil

On Thu, Jun 30, 2022 at 5:18 PM Andrey Borodin <x4mmm@xxxxxxxxxxxxxx> wrote:
Hi!

> On 30 Jun 2022, at 14:23, Nikhil Shetty <nikhil.dba04@xxxxxxxxx> wrote:
>
> Hi,
>
> PostgreSQL version - 11.7
>
> We are using wal-g for backup/restore. We had a duration of high WAL generation due to application usage due to which archive backup was delayed.
>
> Backup was started through cron schedule and it didn't complete because it waits for the last WAL to be archived. Now, I know that long running transactions will prevent vacuuming of dead tuples but in this case we were able to run manual vacuums successfully but auto-vacuum never starts.
>
> As soon as we terminated the long running backup session, the auto-vacuum started kicking in.
>
> Wanted to understand why a backup operation is blocking the auto-vacuum?
>
> Wal-g uses non-exclusive backups and waits for the last wal to be archived.


As far as I know, WAL-G should not prevent running autovacuums on tables. WAL-G can be run against replication standby, where vacuum is not possible at all, so there must be no conflicts...

The only probable problem I can imagine is that we open transaction during pg_stop_backup() [0] to set a statement_timeout. I think this is superflous and unneeded. I think this tx can be safely removed. But anyway, this tx does not have a xid (otherwise it could not run on Standby).

BTW which version of WAL-G do you use?

Best regards, Andrey Borodin.

[0] https://github.com/wal-g/wal-g/blob/8b9b3ca06efb44be4724208dc6f6d11836bd34d9/internal/databases/postgres/queryRunner.go#L243

[Index of Archives]     [Postgresql Home]     [Postgresql General]     [Postgresql Performance]     [Postgresql PHP]     [Postgresql Jobs]     [PHP Users]     [PHP Databases]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Yosemite Forum]

  Powered by Linux