Wendy Cheng wrote:
..... [snip] ... There are many foot-prints of spin_lock - that's worrisome. Hit a couple of "sysrq-w" next time when you have hangs, other than sysrq-t. This should give traces of the threads that are actively on CPUs at that time. Also check your kernel change log (to see whether GFS has any new patch that touches spin lock that doesn't in previous release).
I re-read your console log few minutes ago, followed by a quick browse into cluster git tree. Few of python processes (e.g. pid 4104, 4105, etc) are blocked by locks within gfs_readdir(). This somehow relates to a performance patch committed on 11/6/2008. The gfs_getattr() has a piece of new code that touches vfs inode operation while glock is taken. That's an area that needs examination. I don't have linux kernel source handy to see whether that iput() and igrab() can lead to deadlock though.
If you have the patch in your kernel and if you can, temporarily remove it (and rebuild the kernel) to see how it goes:
commit a71b12b692cac3a4786241927227013bf2f3bf99 Again, take my advice with a grain of salt :) ...I'll stop here. Good luck ! -- Wendy -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster