I've been trying to make use of deadlock detection in libdlm, but without any luck so far. I'm hoping someone can tell me what I'm doing wrong, or how to debug this further. My test code looks like this: #include <sys/types.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #define _REENTRANT #include <libdlm.h> void lock(struct dlm_lksb *l, const char *name, int mode) { printf("[%d] Attempting to lock %s, mode %d\n",getpid(),name,mode); int status = dlm_lock_wait(LKM_NLMODE, l, LKF_EXPEDITE, name, strlen(name), 0, NULL, NULL, NULL); if(status != 0) abort(); status = dlm_lock_wait(mode, l, LKF_CONVERT | LKF_CONVDEADLK, name, strlen(name), 0, NULL, NULL, NULL); if(status == 0) status = l->sb_status; printf("[%d] Status was %d\n",getpid(),status); } int main(void) { pid_t pid = fork(); if(pid == 0) { // child process if(dlm_pthread_init() != 0) abort(); struct dlm_lksb l1,l2; memset(&l1,0,sizeof(l1)); memset(&l2,0,sizeof(l2)); lock(&l1,"A",LKM_PRMODE); lock(&l2,"B",LKM_EXMODE); dlm_unlock_wait(l1.sb_lkid,0,&l1); dlm_unlock_wait(l2.sb_lkid,0,&l2); return EXIT_SUCCESS; } else { // parent process if(dlm_pthread_init() != 0) abort(); struct dlm_lksb l1,l2; memset(&l1,0,sizeof(l1)); memset(&l2,0,sizeof(l2)); lock(&l1,"B",LKM_PRMODE); sleep(5); // wait to ensure child has grabbed A lock(&l2,"A",LKM_EXMODE); dlm_unlock_wait(l2.sb_lkid,0,&l2); dlm_unlock_wait(l1.sb_lkid,0,&l1); } return EXIT_SUCCESS; } This should cause a classic deadlock: process 1 is waiting on resource A, which is locked by process 2. Process 2 is waiting on resource B, which is locked by process 1. >From the manpage, I would expect this to be detected and resolved by one of the lock requests being refused: "Return values *snip* EDEADLOCK The lock operation is causing a deadlock and has been cancelled. If this was a conversion then the lock is reverted to its previously granted state. If it was a new lock then it has not been granted. (NB Only conversion deadlocks are currently detected)" But instead, the process hangs indefinitely, until I kill it: $ ./a.out [27986] Attempting to lock A, mode 3 [27985] Attempting to lock B, mode 3 [27986] Status was 0 [27986] Attempting to lock B, mode 5 [27985] Status was 0 [27985] Attempting to lock A, mode 5 <hangs here> Here's the output of lockdump: $ /sbin/dlm_tool lockdump default id 01aa0005 gr PR rq IV pid 27986 master 2 "A" id 034f0004 gr NL rq EX pid 27985 master 2 "A" id 03630001 gr PR rq IV pid 27985 master 4 "B" id 02070004 gr NL rq EX pid 27986 master 4 "B" and lockdebug: $ /sbin/dlm_tool lockdebug default Resource ffff810c1f02c080 Name (len=1) "A" Local Copy, Master is node 2 Granted Queue 01aa0005 PR Master: 03b80003 Conversion Queue 034f0004 NL (EX) Master: 02310005 Waiting Queue Resource ffff810c1f02cc80 Name (len=1) "B" Local Copy, Master is node 4 Granted Queue 03630001 PR Master: 030c0001 Conversion Queue 02070004 NL (EX) Master: 03530003 Waiting Queue The machine I'm using is running RHEL5. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster