NFS silent death

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I got stuck around NFS problem.

There is a server which serve /home via NFSv4 and root via NFSv3.
There are number of diskless clients.

At some points some of clients hang.
Nothing in the logs and in console most of time.

I tried 2.6.27-2.6.30 kernels on both sides. It looks like generally NFSv4 stable enough, if I just mount /home to diskNess host - no hangs during couple of months. But if I start mix both of them - then problems come. And, there is a problem to have nfs-root on NFSv4, at least all my attempts failed. After some time it always endup with broken id-mapping. And with 2.6.27-2.6.29 nfs3 used to be absolutely non-usable, if you need writing. At least all my games with different mount|export options used end up with the same problem with writing:
echo "test" > test_file
when file doesn't preexist, fail with "Invalid argument"
So, I had to mix them.

So far I come to 2.6.30 kernel and finally got something in the logs, but only for one of clients, rest is still die silent.


[ 9363.096013] INFO: task rpciod/0:745 blocked for more than 120 seconds. [ 9363.102730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9363.110905] rpciod/0      D ffffc20000010e00     0   745      2
[ 9363.117049] ffff8805244a9b10 0000000000000046 ffff8805244a9aa0 0000000000010e00 [ 9363.124876] 0000000000010e00 0000000000010e00 0000000000010e00 0000000000010e00 [ 9363.132689] 0000000000010e00 0000000000010e00 ffff880524a667c0 ffff880124192840
[ 9363.140506] Call Trace:
[ 9363.143138]  [<ffffffff805e4bf0>] schedule+0x1c/0x44
[ 9363.148303]  [<ffffffffa011b720>] nfs_idmap_id+0x1ed/0x287 [nfs]
[ 9363.154567] [<ffffffffa011b7f3>] nfs_map_group_to_gid+0x39/0x4f [nfs]
[ 9363.161332]  [<ffffffffa011227d>] decode_attr_group+0x110/0x1af [nfs]
[ 9363.168011]  [<ffffffffa011277a>] decode_getfattr+0x45e/0x960 [nfs]
[ 9363.174509]  [<ffffffffa01175ef>] nfs4_xdr_dec_open+0xa3/0xef [nfs]
[ 9363.181020] [<ffffffffa00776eb>] rpcauth_unwrap_resp+0x89/0xac [sunrpc]
[ 9363.187935]  [<ffffffffa006f857>] call_decode+0x14e/0x1d3 [sunrpc]
[ 9363.194326]  [<ffffffffa00768f3>] __rpc_execute+0x93/0x278 [sunrpc]
[ 9363.200809] [<ffffffffa0076b4d>] rpc_async_schedule+0x23/0x39 [sunrpc]
[ 9363.207624]  [<ffffffff8026926b>] run_workqueue+0xc9/0x189
[ 9363.213300]  [<ffffffff80269415>] worker_thread+0xea/0x10f
[ 9363.218972]  [<ffffffff8026de90>] kthread+0x69/0xac
[ 9363.224022]  [<ffffffff8020d26a>] child_rip+0xa/0x20
[ 9363.229174] INFO: task ntpd:4161 blocked for more than 120 seconds.
[ 9363.235620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9363.243787] ntpd          D ffffc20000010e00     0  4161      1
[ 9363.249931] ffff880125831bf8 0000000000000082 ffff880126b49ad0 0000000000010e00 [ 9363.257758] 0000000000010e00 0000000000010e00 0000000000010e00 0000000000010e00 [ 9363.265583] 0000000000010e00 0000000000010e00 ffff880122c380c0 ffff8801264184c0
[ 9363.273419] Call Trace:
[ 9363.276035]  [<ffffffff805e4bf0>] schedule+0x1c/0x44
[ 9363.281174]  [<ffffffff805e4c8e>] io_schedule+0x76/0xd0
[ 9363.286581]  [<ffffffff802c4c1c>] sync_page+0x54/0x6c
[ 9363.291815]  [<ffffffff802c4cb6>] __lock_page+0x82/0xb8
[ 9363.297224]  [<ffffffff802c4e7a>] find_lock_page+0x48/0x82
[ 9363.302892]  [<ffffffff802c57a4>] filemap_fault+0x183/0x346
[ 9363.308647]  [<ffffffff802dc7a6>] __do_fault+0x77/0x449
[ 9363.314059]  [<ffffffff802df33c>] handle_mm_fault+0x1fe/0x31b
[ 9363.319986]  [<ffffffff805e96f2>] do_page_fault+0x273/0x29e
[ 9363.325741]  [<ffffffff805e6f95>] page_fault+0x25/0x30
[ 9363.331068]  [<00007fdd26f110e0>] 0x7fdd26f110e0
[ 9363.335865] INFO: task bash:4261 blocked for more than 120 seconds.
[ 9363.342311] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9363.350480] bash          D ffffc20000810e00     0  4261   4259
[ 9363.356611] ffff880925175468 0000000000000082 0000000000000088 0000000000010e00 [ 9363.364441] 0000000000010e00 0000000000010e00 0000000000010e00 0000000000010e00 [ 9363.372253] 0000000000010e00 0000000000010e00 ffff88092649e780 ffff880d26522040
[ 9363.380078] Call Trace:
[ 9363.382709]  [<ffffffff805e550d>] __mutex_lock_common+0x159/0x1fc
[ 9363.388985]  [<ffffffff805e55d7>] __mutex_lock_slowpath+0x27/0x3d
[ 9363.395262]  [<ffffffff805e5280>] mutex_lock+0x25/0x53
[ 9363.400600]  [<ffffffffa011b5c9>] nfs_idmap_id+0x96/0x287 [nfs]
[ 9363.406772]  [<ffffffffa011b842>] nfs_map_name_to_uid+0x39/0x4f [nfs]
[ 9363.413462]  [<ffffffffa01120ce>] decode_attr_owner+0x110/0x1af [nfs]
[ 9363.420140]  [<ffffffffa0112c02>] decode_getfattr+0x8e6/0x960 [nfs]
[ 9363.426642] [<ffffffffa01131d9>] nfs4_xdr_dec_access+0xfd/0x11e [nfs] [ 9363.433408] [<ffffffffa00776eb>] rpcauth_unwrap_resp+0x89/0xac [sunrpc]
[ 9363.440324]  [<ffffffffa006f857>] call_decode+0x14e/0x1d3 [sunrpc]


--
Anton Starikov.


--
Anton Starikov.
Computational Material Science,
Faculty of Science and Technology,
University of Twente.
Phone: +31 (0)53 489 2986
Fax: +31 (0)53 489 2910

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux