On Fri, Jan 04, 2008 at 04:18:45PM -0500, Charlie Brady wrote: > We've reduced the application code to a simple test case. The following > code run on each node will soon block, and doesn't receive signals until > the peer node is shutdown: > > ... > fl.l_whence=SEEK_SET; > fl.l_start=0; > fl.l_len=1; > > while (1) > { > fl.l_type=F_WRLCK; > retval=fcntl(filedes,F_SETLKW,&fl); > if (retval==-1) > { > perror("lock"); > exit(1); > } > // attempt to unlock the index file > fl.l_type=F_UNLCK; > retval=fcntl(filedes,F_SETLKW,&fl); > if (retval==-1) > { > perror("unlock"); > exit(1); > } > } Yes, this stresses a problematic design limitation in the RHEL4 dlm where the dlm master node is ping-ponging all over the place and becomes so unstable that everything comes to a halt. One possible work-around is to modify the program to hold a lock on filedes to keep the master stable, e.g. hold a zero length lock at some unused offset like 0xFFFFFF. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster