(By the way, please try not to start new threads or drop people from the cc.) On Sat, Feb 07, 2009 at 08:49:27PM +0000, Gary L. Grobe wrote: > > Then send us those traces. Please try to avoid wordwrapping them in > the email. > > Ok, so I run my simulations, and then run the following script. > > --- This was run on the master node > echo w > /proc/sysrq-trigger > dmesg -c -s 1000000 > foo > > I've captured these stuck tasks on both the master and the slave node, on separate runs. So the nfsd stacks are all identical: > nfsd D 00000001079318f8 0 5832 2 > ffff880837a23c70 0000000000000046 0000000000000000 0000000000000002 > ffff88083aef5950 ffff88083cda5790 ffff88083aef5b80 0000000437a23c80 > 00000000ffffffff 00000001079318fb 0000000000000000 0000000000000000 > Call Trace: > [<ffffffff8056e8e3>] schedule_timeout+0x8a/0xad > [<ffffffff80237e8b>] process_timeout+0x0/0x5 > [<ffffffff8056e8de>] schedule_timeout+0x85/0xad > [<ffffffff80238178>] msleep+0x14/0x1e > [<ffffffff8030235d>] nfsd_vfs_write+0x221/0x2dd > [<ffffffff8028222b>] __dentry_open+0x14c/0x23b > [<ffffffff80302a93>] nfsd_write+0xc5/0xe2 > [<ffffffff803003bd>] nfsd_proc_write+0xc5/0xde > [<ffffffff80307ac1>] decode_fh+0x1c/0x45 > [<ffffffff802fe8f3>] nfsd_dispatch+0xde/0x1c2 > [<ffffffff805456f5>] svc_process+0x408/0x6e7 > [<ffffffff8056f920>] __down_read+0x12/0x93 > [<ffffffff802fef26>] nfsd+0x1b7/0x285 > [<ffffffff802fed6f>] nfsd+0x0/0x285 > [<ffffffff80240d34>] kthread+0x47/0x73 > [<ffffffff8022bd1d>] schedule_tail+0x27/0x5f > [<ffffffff8020c109>] child_rip+0xa/0x11 > [<ffffffff80240ced>] kthread+0x0/0x73 > [<ffffffff8020c0ff>] child_rip+0x0/0x11 Presumably that's the msleep(10) in nfsd_vfs_write(). That wouldn't explain the same nfsd thread waiting for several seconds, though. Or was it just that that several seconds during which different nfsd threads were stuck in D, not necessarily the same ones? What does your /etc/exports file look like on the server, and what are the mount options on the client? You could try turning off that msleep() with no_wdelay, but it may not help. The more likely explanation is that you just switched to a more recent distro where "sync" (as opposed to "async") is the option. Depending on workload, "async" may improve performance a great deal, at the expense of possible data corruption on server reboot! If you're doing a lot of writing and using NFSv2, then switching to NFSv3 may give you performance close to the "async" performance without the corruption worries. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html