Re: umount(,MNT_DETACH) for nfsv4 hangs when using sec=krb5 and network is down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/20/2012 03:01 PM, Myklebust, Trond wrote:
> On Thu, 2012-12-20 at 14:52 -0700, Orion Poplawski wrote:
>> On 12/20/2012 01:47 PM, Myklebust, Trond wrote:
>>> On Thu, 2012-12-20 at 13:31 -0700, Orion Poplawski wrote:
>>>> On 12/19/2012 03:19 PM, Myklebust, Trond wrote:
>>>>>
>>>>> Commit eb96d5c97b0825d542e9c4ba5e0a22b519355166 (SUNRPC handle
>>>>> EKEYEXPIRED in call_refreshresult), which will be in 3.8-rc1 when Linus
>>>>> releases it, may help.
>>>>>
>>>>
>>>> FWIW - I cherry picked that into the latest Fedora rawhide kernel but no
>>>> effect.  Sounds like a nice patch though, the current hang forever behavior
>>>> doesn't seem the trigger the needed "ah, need a new ticket" response.
>>>>
>>>
>>> So does simply killing the rpc.gssd process help?
>>>
>>
>> Yes, if automount is already stopped (these are automounted directories).  If
>> automount is running, it still seems to hang.  I think I'm going to need to
>> spend some time talking to Ian.
>>
> 
> I'd suggest also taking a long hard look at rpc.gssd and making sure
> that it handles ENETUNREACH, ECONNREFUSED and friends correctly. I
> suspect right now it is just baling out of the upcall instead of
> completing it by propagating the error reply to the kernel.
> 

Actually, I take that back - I'm not sure it's directly involved and killing
rpc.gssd doesn't seem to be helping me now.  I connected to rpc.gssd with
strace, dropped the interface and tried to umount.nfs4 -l but rpc.gssd is
still in poll and doesn't do anyting.  kernel process trace shows:

[ 2788.807017] umount.nfs4     D ffff88007cc13d40     0  3001   3000 0x00000084
[ 2788.807017]  ffff8800361319a8 0000000000000082 ffff880036131fd8
0000000000013d40
[ 2788.807017]  ffff880036131fd8 0000000000013d40 ffff8800773add80
ffff8800773add80
[ 2788.807017]  ffff88007cfe2cb8 0000000000000082 ffffffffa0009cc0
ffff880036131a20
[ 2788.807017] Call Trace:
[ 2788.807017]  [<ffffffffa0009cc0>] ?
__rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[ 2788.807017]  [<ffffffff81634b39>] schedule+0x29/0x70
[ 2788.807017]  [<ffffffffa0009cf5>] rpc_wait_bit_killable+0x35/0x90 [sunrpc]
[ 2788.807017]  [<ffffffff816335a0>] __wait_on_bit+0x60/0x90
[ 2788.807017]  [<ffffffffa0001c50>] ? call_connect+0x90/0x90 [sunrpc]
[ 2788.807017]  [<ffffffffa0009cc0>] ?
__rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
[ 2788.807017]  [<ffffffff81633707>] out_of_line_wait_on_bit+0x77/0x90
[ 2788.807017]  [<ffffffff81080560>] ? autoremove_wake_function+0x40/0x40
[ 2788.807017]  [<ffffffffa0001c50>] ? call_connect+0x90/0x90 [sunrpc]
[ 2788.807017]  [<ffffffffa0001c50>] ? call_connect+0x90/0x90 [sunrpc]
[ 2788.807017]  [<ffffffffa000ac7a>] __rpc_execute+0x13a/0x3f0 [sunrpc]
[ 2788.807017]  [<ffffffffa000bd65>] rpc_execute+0x55/0x90 [sunrpc]
[ 2788.807017]  [<ffffffffa0002e60>] rpc_run_task+0x70/0x90 [sunrpc]
[ 2788.807017]  [<ffffffffa0002ec3>] rpc_call_sync+0x43/0xa0 [sunrpc]
[ 2788.807017]  [<ffffffffa01f5653>] _nfs4_call_sync+0x13/0x20 [nfsv4]
[ 2788.807017]  [<ffffffffa01f4e50>] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
[ 2788.807017]  [<ffffffffa01f9b9e>] nfs4_proc_getattr+0x4e/0x70 [nfsv4]
[ 2788.807017]  [<ffffffffa01b67bc>] __nfs_revalidate_inode+0x8c/0x200 [nfs]
[ 2788.807017]  [<ffffffffa01b69a3>] nfs_revalidate_inode+0x73/0xa0 [nfs]
[ 2788.807017]  [<ffffffffa01afc60>] nfs_check_verifier+0x50/0x80 [nfs]
[ 2788.807017]  [<ffffffffa01b255b>] nfs_lookup_revalidate+0x2fb/0x470 [nfs]
[ 2788.807017]  [<ffffffffa01b2705>] nfs4_lookup_revalidate+0x35/0xe0 [nfs]
[ 2788.807017]  [<ffffffff811a18fb>] complete_walk+0xbb/0x110
[ 2788.807017]  [<ffffffff811a3310>] path_lookupat+0x70/0x7f0
[ 2788.807017]  [<ffffffff811a216f>] ? getname_flags+0x4f/0x1a0
[ 2788.807017]  [<ffffffff811a3abb>] filename_lookup+0x2b/0xc0
[ 2788.807017]  [<ffffffff811a67c4>] user_path_at_empty+0x54/0x90
[ 2788.807017]  [<ffffffff8117e4e6>] ? kmem_cache_free+0x46/0x1f0
[ 2788.807017]  [<ffffffff8115d4e3>] ? remove_vma+0x63/0x70
[ 2788.807017]  [<ffffffff811a6811>] user_path_at+0x11/0x20
[ 2788.807017]  [<ffffffff811b57af>] sys_umount+0x3f/0x3a0
[ 2788.807017]  [<ffffffff81639f7e>] ? do_page_fault+0xe/0x10
[ 2788.807017]  [<ffffffff8163e419>] system_call_fastpath+0x16/0x1b

But every other process in schedule.

The mount point gets "deleted":

# grep mnt /proc/mounts
earth:/export/home/orion /mnt\040(deleted) nfs4
rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.10.11.101,local_lock=none,addr=10.10.10.1
0 0

but that's it.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder Office                  FAX: 303-415-9702
3380 Mitchell Lane                       orion@xxxxxxxx
Boulder, CO 80301                   http://www.nwra.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux