On Wed, Jan 31, 2024 at 2:05 PM abrahim abrahao <a_abrahao@xxxxxxxxxxxx> wrote:
There is a three-day vacuum freeze on a partition table with wait_event = BufferPin, no transactions (active or idle) older than it that are not blocked, but at least one select query is executing at all times related of this partition table....
Is there a wait to figure out which session vacuum freeze to wait for?
The vacuum needs a chance to get in and make changes to the table, but it's not being blocked at the traditional lock level that shows up in pg_blocking_pids. You can see what is going on with this:
select pid, mode, query_start, query from pg_locks join pg_stat_activity using (pid) where relation::regclass::text = 'mytable' order by 3;
That may show your vacuum process with a ShareUpdateExclusiveLock and some other processes with other locks, probably AccessShareLock. Those other pids need to all finish or be killed - and not have any overlap between them. In other words, that vacuum process needs to have exclusive access to the table for a split second, no matter if the other process locked the table before or after the vacuum started. One crude solution would be to cancel any other existing backends interested in that table:
select pg_cancel_backend(pid), now()-query_start, query from pg_locks join pg_stat_activity using (pid) where relation::regclass::text = 'mytable' and lower(query) !~ 'vacuum';
Not a good long-term solution, but depending on how often the table is updated, you might have other options. Perhaps disable autovacuum for this table and do a manual vacuum (e.g. in a cron script) that kills the other backends as per above, or runs during a time with not-constant reads on the table. Or have something that is able to pause the application. Or if this is a partitioned table that might get dropped in the future or at least not queried heavily, do not worry about vacuuming it now.
Cheers,
Greg