On 05/10/2010 05:20 AM, Beast in Black wrote:
Greetings. Every so often, when i'm writing via NFS to a loopback-mounted file, i find that about 10-15 nfsd threads (out of a total of 64) go into D state, along with the loop file, and never recover from the D state. My setup is as follows: 1. sparse file is created via dd and loopback-mounted onto a /dev/loopX device (where 0<= X<= 100) 2. sparse file is mke2fs'd and mounted on mount point "/volumes/localvol" 3. "/volumes/localvol" is exported with options *(rw,no_root_squash,no_subtree_check,async,insecure,nohide,no_wdelay). 4. /volumes/localvol is set as a network datastore (NFS mount) in ESX 5. Virtual machine files for an ESX VM are copied into the NFS mount on ESX 6. Virtual machine is powered on and I do some activity in it...write files etc. At this point, the VM is running fine in ESX. After a while, however, I notice that the VM freezes and that ESX reports the NFS mounted datastore as unreachable. When I check the NFS server machine, I find that 10-15 NFS threads are in D state, along with the associated loopback-mounted file. The D states are never recovered from, and the only way out is to reboot the NFS server machine. I have also tried with specifying the export as "sync" instead of "async" (and removing no_wdelay) but I still see the same behavior. The NFS server is running the vanilla 2.6.30 kernel on Ubuntu 8.10. The NFS exports are all NFSv3. Does anyone have an idea of why this may be occurring? I would be glad to provide any additional info required.
There may be a deadlock due to memory pressure on the server. You might get some information by doing a "sudo echo 't' > /proc/sysrq_trigger", then looking in your syslog, when the server gets into the hung state.
-- chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html