Hi again Some good news. After some more debugging & reflection, I realized that the likely cause is one of our own libraries that gets loaded as part of some custom functions we are using. Some of these functions trigger fetching of remote resources, for which a timeout is set using `alarm`. The function unfortunately does not re-establish any pre-existing interval timers after it is done, which leads to postgresql missing it's own expected alarm signal. The reason that this was not affecting us on previous postgres versions was this commit: https://github.com/postgres/postgres/commit/09cf1d52267644cdbdb734294012cf1228745aaa#diff-b12a7ca3bf9c6a56745844c2670b0b28d2a4237741c395dda318c6cc3664ad4a After this commit, once an alarm is missed, that backend never sets one again, so no timeouts of any kind will work. Therefore, the deadlock detector was never being run. Prior to that, the next time any timeout was set by the backend it would re-establish it's timer. We will of course fix our own code to prevent this issue, but I am a little concerned at the above commit as it reduces the robustness of postgres in this situation. Perhaps I will raise it on the pgsql-hackers list. Cheers Mike On Tue, 1 Feb 2022 at 17:50, Michael Harris <harmic@xxxxxxxxx> wrote: > > Hi > > The undetected deadlock occurred again today and I was able to collect > some more info. > > The presentation was very similar to the case I reported previously: > - One backend trying to do a DROP TABLE on a partition of a > partitioned table, waiting for an AccessExclusiveLock on that table > - Another backend trying to do a DELETE on records within a partition > of another table, waiting on an AccessShareLock which is already held > by the first table and already holding a lock on the table that the > first backend is waiting for > - A load of other backends also trying to do a DELETE. > > I was able to attach gdb to the backends and I discovered a few things. > > First, taking a stack trace of the process doing the DELETE, it looks like this: > > #0 0x00007fc82a6750bb in epoll_wait (epfd=13, events=0x18696f8, > maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at > ../sysdeps/unix/sysv/linux/epoll_wait.c:30 > #1 0x00000000007d0c32 in WaitEventSetWaitBlock (nevents=1, > occurred_events=<optimized out>, cur_timeout=-1, set=0x1869698) at > latch.c:1450 > #2 WaitEventSetWait (set=0x1869698, timeout=<optimized out>, > occurred_events=<optimized out>, nevents=1, wait_event_info=<optimized > out>) at latch.c:1396 > #3 0x00000000007d0f94 in WaitLatch (latch=<optimized out>, > wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=0, > wait_event_info=50331648) at latch.c:473 > #4 0x00000000007eaea7 in ProcSleep > (locallock=locallock@entry=0x24571d0, > lockMethodTable=lockMethodTable@entry=0xdb2360 <default_lockmethod>) > at proc.c:1361 > #5 0x00000000007df419 in WaitOnLock > (locallock=locallock@entry=0x24571d0, owner=owner@entry=0x18b80f8) at > lock.c:1858 > #6 0x00000000007e052c in LockAcquireExtended > (locktag=locktag@entry=0x7ffced1d0b80, lockmode=lockmode@entry=1, > sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, > reportMemoryError=reportMemoryError@entry=true, > locallockp=locallockp@entry=0x7ffced1d0b78) at lock.c:1100 > #7 0x00000000007ddb23 in LockRelationOid (relid=1842227607, > lockmode=1) at lmgr.c:117 > #8 0x000000000050dba5 in relation_open (relationId=1842227607, > lockmode=lockmode@entry=1) at relation.c:56 > #9 0x00000000008fc838 in generate_partition_qual (rel=0x56614e8) at > partcache.c:361 > #10 generate_partition_qual (rel=0x56614e8) at partcache.c:336 > #11 0x00000000007502bf in set_baserel_partition_constraint > (relation=relation@entry=0x56614e8, rel=<optimized out>, > rel=<optimized out>) at plancat.c:2402 > ~snip~ > > The lock is being taken out by the function generate_partition_qual > (in partcache.c) which has this comment: > > /* > * Grab at least an AccessShareLock on the parent table. Must do this > * even if the partition has been partially detached, because transactions > * concurrent with the detach might still be trying to use a partition > * descriptor that includes it. > */ > > This is happening during query planning rather than execution, where > it is trying to fetch the partition bound expression. > It explains why this lock is not always required (the result is cached > in the relcache, and anyway it looks to me like this code path is > called depending on factors such as the row estimates). > > The second thing was that the DROP process, and all the DELETE > processes, were all waiting inside epoll_wait ultimately called from > ProcSleep as shown above. It is sitting in this section of code > (proc.c line 1361): > > (void) WaitLatch(MyLatch, WL_LATCH_SET | > WL_EXIT_ON_PM_DEATH, 0, <--- Processes sitting > here > PG_WAIT_LOCK | locallock->tag.lock.locktag_type); > ResetLatch(MyLatch); > /* check for deadlocks first, as that's probably log-worthy */ > if (got_deadlock_timeout) > > <-- set breakpoint here > { > CheckDeadLock(); > got_deadlock_timeout = false; > } > > I set a breakpoint on the next line after ResetLatch is called, and it > was never reached. I thought that could be due to GDB attaching to the > process and disrupting the alarm signal that should be sent into the > timeout handler, so I manually sent it a SIGALRM. > At that point the breakpoint was triggered - but the flag > got_deadlock_timeout is not set, so the deadlock check is not > executed. > This led me to believe that the deadlock check timeout wasn't set > properly at all. > > To see if the deadlock would be detected if CheckDeadlock() got an > opportunity to run, I manually set got_deadlock_timeout, and sure > enough the deadlock was detected and released. > > I concluded that the deadlock detection function is able to detect > these deadlocks, but for some reason the 1sec timeout is not being > correctly scheduled so it never runs. > When it is artificially triggered, it does release the deadlock. > > After the release, a new deadlock formed between the DROP process and > another one of the waiting DELETE processes (not an entirely > unexpected outcome). > I will leave it in that state for as long as possible in case anyone > can think of any other info that should be gathered. Luckily it is a > test instance. > > Sorry for the slightly long and detailed email - let me know if this > should be moved to another forum. > > Cheers > Mike