Re: nfs4 hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I managed to make it go a little further and now the nodes will finish booting, though they hang when I start other nodes and start using files. And this was captured in the syslog from one of the nodes where nfs4 goes unresponsive. Not familiar enough with nfs4 to read it, however. Does this point to a culprit?

It's as if the nodes are writing all over each others connections, like maybe I have a shared directory where each node must have its own private directory. They each have their own /var/lib and /var/lib/nfs/rpc_pipefs mounted. Anything else that nfs clients write to (aside from proc and dev)?

May 21 17:13:45 192.168.1.243 kernel: INFO: task rpciod/0:1297 blocked for more than 120 seconds.
May 21 17:13:45 192.168.1.243 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 17:13:45 192.168.1.243 kernel: rpciod/0      D 00000097  2580  1297      7          1306   340 (L-TLB)
May 21 17:13:45 192.168.1.243 kernel:        de89de40 00000046 1dfa3ac0 00000097 de8ca3b4 c1984c78 00000001 0000000a
May 21 17:13:45 192.168.1.243 kernel:        de8ce000 1dfa3ac0 00000097 00000000 00000000 de8ce10c c1787100 f54ebac0
May 21 17:13:45 192.168.1.243 kernel:        c041ebcc 00000000 00000000 00000003 00000296 c1984b00 c17875bc de89de68
May 21 17:13:45 192.168.1.243 kernel: Call Trace:
May 21 17:13:45 192.168.1.243 kernel:  [<c041ebcc>] __wake_up+0x2a/0x3d
May 21 17:13:45 192.168.1.243 kernel:  [<f8916155>] rpc_queue_upcall+0x92/0x99 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f8992b04>] nfs_idmap_id+0x18b/0x1f6 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c041f7a7>] default_wake_function+0x0/0xc
May 21 17:13:45 192.168.1.243 kernel:  [<f8992ba9>] nfs_map_name_to_uid+0x1b/0x1f [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898d719>] decode_getfattr+0x611/0xbbd [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898c0bd>] decode_open+0xf3/0x25e [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898e47c>] decode_getfh+0x61/0xa3 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898e4be>] nfs4_xdr_dec_open+0x0/0xa2 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898e533>] nfs4_xdr_dec_open+0x75/0xa2 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f890e755>] rpcauth_unwrap_resp+0x5b/0x60 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f89097e9>] call_decode+0x437/0x47d [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f898e4be>] nfs4_xdr_dec_open+0x0/0xa2 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f890df54>] __rpc_execute+0x7a/0x1f8 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<c0433276>] run_workqueue+0x78/0xb5
May 21 17:13:45 192.168.1.243 kernel:  [<f890e0d2>] rpc_async_schedule+0x0/0x5 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<c0433b2a>] worker_thread+0xd9/0x10b
May 21 17:13:45 192.168.1.243 kernel:  [<c041f7a7>] default_wake_function+0x0/0xc
May 21 17:13:45 192.168.1.243 kernel:  [<c0433a51>] worker_thread+0x0/0x10b
May 21 17:13:45 192.168.1.243 kernel:  [<c0435f43>] kthread+0xc0/0xed
May 21 17:13:45 192.168.1.243 kernel:  [<c0435e83>] kthread+0x0/0xed
May 21 17:13:45 192.168.1.243 kernel:  [<c0405c53>] kernel_thread_helper+0x7/0x10
May 21 17:13:45 192.168.1.243 kernel:  =======================
May 21 17:13:45 192.168.1.243 kernel: INFO: task java:13420 blocked for more than 120 seconds.
May 21 17:13:45 192.168.1.243 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 17:13:45 192.168.1.243 kernel: java          D 00000098  2068 13420      1         13421 13386 (NOTLB)
May 21 17:13:45 192.168.1.243 kernel:        f6bbfab0 00000086 ada06900 00000098 00000000 c06ba340 dfc6d280 0000000a
May 21 17:13:45 192.168.1.243 kernel:        f5781000 ada06900 00000098 00000000 00000000 f578110c c1787100 f649d3c0
May 21 17:13:45 192.168.1.243 kernel:        00000098 00000000 de7c6804 c1787100 de7c6800 f6bbfb1c c04ec500 ffffffff
May 21 17:13:45 192.168.1.243 kernel: Call Trace:
May 21 17:13:45 192.168.1.243 kernel:  [<c04ec500>] __next_cpu+0x12/0x21
May 21 17:13:45 192.168.1.243 kernel:  [<c061cf25>] __mutex_lock_slowpath+0x4d/0x7c
May 21 17:13:45 192.168.1.243 kernel:  [<c061cf63>] .text.lock.mutex+0xf/0x14
May 21 17:13:45 192.168.1.243 kernel:  [<f89929f2>] nfs_idmap_id+0x79/0x1f6 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c041f7a7>] default_wake_function+0x0/0xc
May 21 17:13:45 192.168.1.243 kernel:  [<f8992ba9>] nfs_map_name_to_uid+0x1b/0x1f [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898d719>] decode_getfattr+0x611/0xbbd [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c042d5f5>] lock_timer_base+0x15/0x2f
May 21 17:13:45 192.168.1.243 kernel:  [<f898e025>] nfs4_xdr_dec_getattr+0x0/0x45 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898e063>] nfs4_xdr_dec_getattr+0x3e/0x45 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f890e755>] rpcauth_unwrap_resp+0x5b/0x60 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f89097e9>] call_decode+0x437/0x47d [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f898e025>] nfs4_xdr_dec_getattr+0x0/0x45 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f890df54>] __rpc_execute+0x7a/0x1f8 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f8909951>] rpc_call_sync+0x6b/0x91 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f89879f9>] nfs4_proc_getattr+0x6f/0x8b [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f897a157>] __nfs_revalidate_inode+0x120/0x249 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c061c526>] schedule+0xa22/0xa55
May 21 17:13:45 192.168.1.243 kernel:  [<f8975dbe>] nfs_lookup_revalidate+0x1e8/0x3d7 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c042d5f5>] lock_timer_base+0x15/0x2f
May 21 17:13:45 192.168.1.243 kernel:  [<f898baea>] decode_compound_hdr+0x35/0x72 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898b9da>] decode_op_hdr+0xd/0x58 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f898bf3c>] nfs4_xdr_dec_access+0x54/0x8d [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c042d5f5>] lock_timer_base+0x15/0x2f
May 21 17:13:45 192.168.1.243 kernel:  [<c042d769>] __mod_timer+0xda/0xe4
May 21 17:13:45 192.168.1.243 kernel:  [<f890e4f8>] rpc_wake_up_next+0x11f/0x126 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<c042e0d2>] recalc_sigpending+0xe/0x20
May 21 17:13:45 192.168.1.243 kernel:  [<c042e194>] sigprocmask+0xb0/0xce
May 21 17:13:45 192.168.1.243 kernel:  [<f890996f>] rpc_call_sync+0x89/0x91 [sunrpc]
May 21 17:13:45 192.168.1.243 kernel:  [<f8986c9d>] _nfs4_proc_access+0xa7/0xdb [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<f8976912>] nfs_access_add_cache+0xad/0x135 [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c048b233>] dput+0x22/0xed
May 21 17:13:45 192.168.1.243 kernel:  [<f897816c>] nfs_open_revalidate+0x186/0x1da [nfs]
May 21 17:13:45 192.168.1.243 kernel:  [<c0481e46>] do_lookup+0x135/0x174
May 21 17:13:45 192.168.1.243 kernel:  [<c0483652>] __link_path_walk+0x314/0xd33
May 21 17:13:45 192.168.1.243 kernel:  [<c04840a9>] link_path_walk+0x38/0x95
May 21 17:13:45 192.168.1.243 kernel:  [<c048446a>] do_path_lookup+0x219/0x27f
May 21 17:13:45 192.168.1.243 kernel:  [<c0483652>] __link_path_walk+0x314/0xd33
May 21 17:13:45 192.168.1.243 kernel:  [<c04840a9>] link_path_walk+0x38/0x95
May 21 17:13:45 192.168.1.243 kernel:  [<c048446a>] do_path_lookup+0x219/0x27f
May 21 17:13:45 192.168.1.243 kernel:  [<c0484bc4>] __user_walk_fd+0x29/0x3a
May 21 17:13:45 192.168.1.243 kernel:  [<c047e380>] vfs_stat_fd+0x15/0x3c
May 21 17:13:45 192.168.1.243 kernel:  [<c041f7a7>] default_wake_function+0x0/0xc
May 21 17:13:45 192.168.1.243 kernel:  [<c047e434>] sys_stat64+0xf/0x23
May 21 17:13:45 192.168.1.243 kernel:  [<c042c7a5>] getnstimeofday+0x30/0xb6
May 21 17:13:45 192.168.1.243 kernel:  [<c04f0b41>] copy_to_user+0x31/0x48
May 21 17:13:45 192.168.1.243 kernel:  [<c04356e7>] sys_clock_gettime+0x58/0x7e
May 21 17:13:45 192.168.1.243 kernel:  [<c0404f17>] syscall_call+0x7/0xb
May 21 17:13:45 192.168.1.243 kernel:  =======================


Christopher Hawkins

Cell: 315-882-2551
Fax:   716-662-0788

----- "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:

> On Thu, May 20, 2010 at 03:54:28PM -0400, Christopher Hawkins wrote:
> > Hello, 
> > 
> > I have a bash script that sets up a root filesystem over NFS. The
> method works fine over NFS3 but hangs when using NFS4 as a transport.
> Hoping someone can help me figure out why?
> > 
> > In a nutshell:
> > 
> > 1. The node boots into a ramdisk that has a few thousand files in it
> already
> > 2. connects to an NFS server exporting / and mounts that in a tmp
> dir
> > 3. bash (on the booting node) loops through the local and remote
> filesystem and creates a local symlink to anything in the nfs root
> that it doesn't already have.
> 
> Wild.
> 
> > Export options are: /
> 192.168.1.0/24(rw,no_root_squash,no_subtree_check,fsid=0)
> > Mount options are: rw,sync,noatime,rsize=1024,wsize=1024
> > 
> > rpcidmapd, portmap, statd are running on server and client, and
> rpc_pipefs is mounted on both as well. The script currently makes
> about 6200 links, total. Nfs4 hangs about 90% of the way through. I
> removed the directory it was hanging on (a simple, empty dir) and then
> it just hung on the next one.
> 
> Is it possible there's some idmap deadlock?  (If say an upcall to
> idmapd
> is hanging because the mapping requires access to some file that
> isn't
> there yet?)
> 
> A sysrq-T trace might help.
> 
> --b.
> 
> > Really appreciate any help or pointers on where to look for more
> info. Thanks!
> > 
> > Chris
> > 
> > 
> >    
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux