Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2009-08-27 at 10:54 -0400, Chuck Lever wrote:
> On Aug 27, 2009, at 10:52 AM, Trond Myklebust wrote:
> > On Thu, 2009-08-27 at 10:38 -0400, Chuck Lever wrote:
> >> On Aug 27, 2009, at 4:54 AM, Ian Kent wrote:
> >>> Ian Kent wrote:
> >>>> Carlos André wrote:
> >>>>> Hi Ian,
> >>>>>
> >>>>> Thanks for patch and sorry for delay (i'm expecting receive u
> >>>>> reply on
> >>>>> bug track, not here) :)
> >>>>>
> >>>>> But, this patch doesnt worked to me like expected...  :(
> >>>>>
> >>>>>
> >>>>> Firstly I've changed "#MOUNT_WAIT=-1" to "MOUNT_WAIT=10"
> >>>>> and later changed "10" to "2" with same results...
> >>>>> (always restarting service, of course :)
> >>>>>
> >>>>> Then, tried remove "sec=krb5p", and later removed "nfs4" but i got
> >>>>> same results again.
> >>>>>
> >>>>> Or i'm doing something wrong?
> >>>>>
> >>>>>
> >>>>> [root@KSTATION areas]# automount -V
> >>>>>
> >>>>> Linux automount version 5.0.1-0.rc2.131.bz517349.1
> >>>>> [...]
> >>>>>
> >>>>> [root@KSTATION areas]# time ls -la testdown
> >>>>> ls: testedown: No such file or directory
> >>>>>
> >>>>> real    3m9.006s
> >>>>> user    0m0.002s
> >>>>> sys     0m0.000s
> >>>>
> >>>> OK, that isn't behaving the way I expect, I'll have a look.
> >>>>
> >>>>>
> >>>>> LOGGING:
> >>>>> -----------------------------------------
> >>>>> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount:  
> >>>>> mount(nfs):
> >>>>> calling mount -t nfs4 -s -o rw,acl,sec=krb5p 1.2.3.4:/areas/ 
> >>>>> testdown
> >>>>> /misc/areas/testdown
> >>>>> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
> >>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
> >>>>> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token
> >>>>> = 91
> >>>>> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/
> >>>>> areas/testdown
> >>>>> -----------------------------------------
> >>>
> >>> Having a look at this I suspect the reason it doesn't work as  
> >>> expected
> >>> is the waitpid(2) we do after sending the TERM signal to the mount
> >>> process (which we have to do) is not returning. This is likely  
> >>> because
> >>> the mount process isn't giving up in a shorter time as it used to.
> >>
> >> You're thinking maybe mount(2) should be as interruptible as the
> >> socket calls that the mount command used to do?  That might be
> >> reasonable, and I can take a look at that.
> >
> > In recent kernels, all those RPC calls should be using TASK_KILLABLE
> > sleep states. SIGTERM should cause them to abort, provided that some
> > process isn't blocking it.
> >
> > Perhaps TASK_KILLABLE could be backported to RHEL-5?
> 
> That's pretty extensive, with hooks in the page cache.  I doubt RH  
> would go for that.

You don't have to add the hooks in the page cache in order to make mount
interruptible. You just need to replace the sigmask-manipulation in
net/sunrpc and fs/nfs (a.k.a. rpc_clnt_sigmask()/rpc_clnt_sigunmask())
with TASK_KILLABLE.

Alternatively, it might suffice to just turn on the 'intr' flag
temporarily while doing the mount path walk, and then switch it to
whatever default the user actually specified afterwards.

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux