Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Ondrej Valousek <webserv@xxxxxxxxxx> · Thu, 17 Sep 2009 15:12:01 +0200

https://bugzilla.redhat.com/show_bug.cgi?id=517349
no news.....

Carlos André wrote:
Hi ppl,

any news about this problem? :)

Thanks.

2009/8/27 Chuck Lever <chuck.lever@xxxxxxxxxx>:

On Aug 27, 2009, at 11:00 AM, Trond Myklebust wrote:

On Thu, 2009-08-27 at 10:54 -0400, Chuck Lever wrote:

On Aug 27, 2009, at 10:52 AM, Trond Myklebust wrote:

On Thu, 2009-08-27 at 10:38 -0400, Chuck Lever wrote:

On Aug 27, 2009, at 4:54 AM, Ian Kent wrote:

Ian Kent wrote:

Carlos André wrote:

Hi Ian,

Thanks for patch and sorry for delay (i'm expecting receive u
reply on
bug track, not here) :)

But, this patch doesnt worked to me like expected...  :(

Firstly I've changed "#MOUNT_WAIT=-1" to "MOUNT_WAIT=10"
and later changed "10" to "2" with same results...
(always restarting service, of course :)

Then, tried remove "sec=krb5p", and later removed "nfs4" but i got
same results again.

Or i'm doing something wrong?

[root@KSTATION areas]# automount -V

Linux automount version 5.0.1-0.rc2.131.bz517349.1
[...]

[root@KSTATION areas]# time ls -la testdown
ls: testedown: No such file or directory

real    3m9.006s
user    0m0.002s
sys     0m0.000s

OK, that isn't behaving the way I expect, I'll have a look.

LOGGING:
-----------------------------------------
Aug 24 09:23:51 KSTATION automount[20803]: mount_mount:
mount(nfs):
calling mount -t nfs4 -s -o rw,acl,sec=krb5p 1.2.3.4:/areas/
testdown
/misc/areas/testdown
Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token
= 91
Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/
areas/testdown
-----------------------------------------

Having a look at this I suspect the reason it doesn't work as
expected
is the waitpid(2) we do after sending the TERM signal to the mount
process (which we have to do) is not returning. This is likely
because
the mount process isn't giving up in a shorter time as it used to.

You're thinking maybe mount(2) should be as interruptible as the
socket calls that the mount command used to do?  That might be
reasonable, and I can take a look at that.

In recent kernels, all those RPC calls should be using TASK_KILLABLE
sleep states. SIGTERM should cause them to abort, provided that some
process isn't blocking it.

Perhaps TASK_KILLABLE could be backported to RHEL-5?

That's pretty extensive, with hooks in the page cache.  I doubt RH
would go for that.

You don't have to add the hooks in the page cache in order to make mount
interruptible. You just need to replace the sigmask-manipulation in
net/sunrpc and fs/nfs (a.k.a. rpc_clnt_sigmask()/rpc_clnt_sigunmask())
with TASK_KILLABLE.

That sounds like a schlep.

Alternatively, it might suffice to just turn on the 'intr' flag
temporarily while doing the mount path walk, and then switch it to
whatever default the user actually specified afterwards.

That sounds easy, especially for an EL5 kernel.  Maybe "soft" too for the
first few requests?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html