On 2022-Jan-25, Michael Harris wrote: > We've recently updated our application to PG 14.1, and in the test instance we > have started to see some alarming undetected deadlocks. This is indeed suspicious / worrisome / curious. What version were you using previously? I reformatted the result sets: > An example of what we have seen is: > > locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart | relation > ----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+-----------------+---------+----------+-------------------------------+-------------------------------- > relation | 529986 | 1842228045 | | | | | | | | 165/1941408 | 2130531 | AccessShareLock | f | f | 2022-01-19 00:32:32.626152+01 | st.ctr_table_efr_oa > (1 row) > > locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart | relation > ----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-----------+-------------------------------- > relation | 529986 | 1842228045 | | | | | | | | 75/2193719 | 2128603 | AccessExclusiveLock | t | f | | st.ctr_table_efr_oa > (1 row) > > locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart | relation > ----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------+-------------------------------+----------- > relation | 529986 | 1842231489 | | | | | | | | 75/2193719 | 2128603 | AccessExclusiveLock | f | f | 2022-01-19 00:32:32.924694+01 | st.tpd_oa > (1 row) > > locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart | relation > ---------------+----------+------------+------+-------+--------------+---------------+---------+-----------+----------+--------------------+---------+-----------------------+---------+----------+-------------------------------+----------- > relation | 529986 | 1842231489 | | | | | | | | 165/1941408 | 2130531 | AccessShareLock | t | f | | st.tpd_oa > > So: > pid 2130531 waits for an AccessShareLock on relation 1842228045, blocked by pid 2128603 which holds an AccessExclusiveLock > pid 2128603 waits for an AccessExclusiveLock on relation 1842231489, blocked by pid 2130531 which holds an AccessShareLock > > The queries being executed by these backends are: > > pid | query_start | state_change | wait_event_type | wait_event | state | query > ---------+-------------------------------+-------------------------------+-----------------+------------+--------+------------------------------------------------------------------------- > 2128603 | 2022-01-19 00:32:32.924413+01 | 2022-01-19 00:32:32.924413+01 | Lock | relation | active | DROP TABLE st.tpd_oa_18929 > 2130531 | 2022-01-19 00:32:32.625706+01 | 2022-01-19 00:32:32.625708+01 | Lock | relation | active | DELETE FROM st.ctr_table_efr_oa_19010 WHERE ropid = 44788868 > (2 rows) I know of no cases in which we fail to detect a deadlock. Perhaps you have indeed hit a bug. > Note that there were a lot of other processes also waiting on relation > 1842231489 - could that be confusing the deadlock detection routine? It shouldn't. > I am also confused about the locks which are being taken out by the > DELETE query. Maybe the lock is already taken before the DELETE is run; do you have any triggers, rules, constraints, or anything? If you have seen this several times already, maybe a way to investigate deeper is an exhaustive log capture of everything that these transactions do, from the point they begin until they become blocked (log_statement=all). Perhaps you need to involve other concurrent transactions in order to cause the problem. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Tiene valor aquel que admite que es un cobarde" (Fernandel)