On Thu, 2009-08-27 at 09:25 -0500, Johnson, Eric wrote: >> I have a 32-bit RHEL 5.3 Cluster Suite setup of two nodes with GFS2 file >> systems on FC attached SAN. I have run into this issue twice now, where >> attempts to access a certain directory within one of the GFS2 file >> systems never return. Other directories and paths within that file >> system work just fine. >> >> The first time it happened, I had to crash the node to get it to release >> the FS, then unmount it on both nodes, fsck it, remount it, and it was >> fine. It has happened again (different path, different file system). A >> simple "ls" in the directory (which has maybe 20 files in it) leaves the >> process in an uninterruptible sleep state. I left it all night and it >> never returned. >> >> I'm not sure what other info would be useful on this, but this is what I >> see from a gfs2_tool lockdump output for ls PID on that node: >> >> G: s:UN n:2/bf1df f:l t:SH d:EX/0 l:0 a:0 r:4 >> H: s:SH f:aW e:0 p:9938 [ls] gfs2_lookup+0x44/0x90 [gfs2] > ^ The W flag indicates that this is waiting for a glock > >Currently the glock is in the UN (unlocked) state, and its trying to get >a SH (shared) lock. The next step in the investigation is to look for >the same glock number 2/bf1df on the other nodes, and see what is >holding that lock. This particular node will hang until the lock is >released on whichever other node is holding it. > >If there is nothing on any other node apparently holding that lock in >the glock dumps, then looking at dlm lock dumps would be the next step, > >Steve. Thanks for the response, Steve. I found this reference to that lock on the other node: G: s:EX n:2/bf1df f:dy t:EX d:SH/0 l:0 a:0 r:4 I: n:1155192/782815 t:8 f:0x00000010 I'm having trouble finding documentation that describes what each of these fields are. There's no obvious process ID here, and all I can determine is that it's an exclusive lock. Eric -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster