Hello, I'm trying to debug a problem we are seeing in our systems and was hoping to get some insight if possible. We operate a fairly large (~500) cluster of linux VMs running a modified version of OpenWRT 10.03 (linux kernel version 2.6.32.27) These machines run Samba (version 3.5.11) that share files from an NFS (over TCP) filesystem. The SMB daemons (and backing mounts) are brought up and down continuously throughout the day, and in order to make sure the take-down is timely, we kill -TERM the parent SMB process and lazily unmount (umount -l) the NFS share. Most of the time, this works just fine. On occasion (usually about once or twice per day) we end up with an SMB process that is stuck in (as best as I can tell) nfs_getattr. At this point the CPU is 100% in io_wait. Our only recourse appears to be rebooting the system when this happens. Other relavent facts: 1. I backported the -local_lock=flock patches into this kernel. 2. The mount options of the NFS mount are rw,noatime,noexec,fg,hard,intr,tcp,rsize=32768,wsize=32768,local_lock=flock 3. This behavior seems like it may be new (or at least it is definitely more exacerbated) since we upgraded from the 8.09 OpenWRT release which was using the 2.6.25.20 kernel. 4. nfsstat doesn't indicate any network issues with retransmits or the like I can provide SysRq-T trace of the SMB process as well as a System.map file of my kernel build if that helps. I have a few of the systems currently in this state so I may be able to perform some live debugging. Let me know if there is anything else I can provide. Thank you for any help. - Pete -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html