On Fri, 2004-10-15 at 16:21, Daniel McNeil wrote: > Hey all, > > I am testing gfs on 2.6.8.1. I have 3 machines connected > to shared fibre channel storage. Currently I have 2 nodes > in the cluster and gfs on a 5 disk stripe mounted on 2 nodes. > > I was running 'tar xvf /views/linux-2.6.8.1' on each > each node in separate directories of the same gfs file system. > > 1 tar finished, but the other is stuck. > > cat /proc/6601/wchan shows glock_wait_internal > > Is there anyway to pull off more info that might be useful > in debugging this? I have a bit more info, but not as much as I wanted. I left the cluster with the tar hung over the weekend. The cluster was configured for 3 nodes, but with only 2 nodes up with gfs mounted on each. Sometime over the weekend, the nodes lost communication (maybe network glitch) Both nodes got CMAN: quorum lost, blocking activity I did a cman_tool join on the 3rd node and it joined with the node that happened to have the tar hung and it got: CMAN: quorum regained, resuming activity However, the tar remained hung. I rebooted the 1st and had it join the cluster. cat /proc/cluster/{status,nodes} showed all three had joined the cluster, but the tar was still hung. On the node with the hung tar, cat /proc/cluster/services also hung. Started a 2nd mount of the gfs file system on the 2nd node, the the mount hung. rebooted the node with the hung tar. The mount on the 2nd node completed. I did not have SYSRQ enabled, so I could not get stack traces from the hung tar. I'm rebooting on a kernel with SYSRQ enabled and will keep testing. Daniel