On Tue, Apr 28, 2009 at 12:34:19PM -0400, Chuck Lever wrote: > On Apr 28, 2009, at 12:25 PM, J. Bruce Fields wrote: >> On Thu, Apr 23, 2009 at 07:33:33PM -0400, Chuck Lever wrote: >>> When rpc.statd starts up in user space at boot time, it attempts to >>> write the latest NSM local state number into >>> /proc/sys/fs/nfs/nsm_local_state. >>> >>> If lockd.ko isn't loaded yet (as is the case in most configurations), >>> that file doesn't exist, thus the kernel's NSM state remains set to >>> its initial value of zero during lockd operation. >>> >>> This is a problem because rpc.statd and lockd use the NSM state >>> number >>> to prevent repeated lock recovery on rebooted hosts. If lockd sends >>> a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state >>> number is received, there is no way for lockd or rpc.statd to >>> distinguish that stale SM_NOTIFY from an actual reboot. Thus lock >>> recovery could be performed after the rebooted host has already >>> started reclaiming locks, and those locks will be lost. >>> >>> We could change /etc/init.d/nfslock so it always modprobes lockd.ko >>> before starting rpc.statd. However, if lockd.ko is ever unloaded >>> and reloaded, we are back at square one, since the NSM state is not >>> preserved across an unload/reload cycle. This may happen frequently >>> on clients that use automounter. A period of NFS inactivity causes >>> lockd.ko to be unloaded, and the kernel loses its NSM state setting. >> >> Aie. Can we also fix the automounter or some other part of the >> userspace configuration? > > User space isn't the problem here... it's the fact that lockd can get > unloaded after a period of inactivity. IMO lockd should be pinned in > the kernel after it is loaded with /etc/init.d/nfslock. > >>> Instead, let's use the fact that rpc.statd plants the local system's >>> NSM state in every SM_MON (and SM_UNMON) reply. lockd performs a >>> synchronous SM_MON upcall to the local rpc.statd _before_ sending its >>> first NLM request to a new remote. This would permit rpc.statd to >>> provide the current NSM state to lockd, even after lockd.ko had been >>> unloaded and reloaded. >>> >>> Note that NLMPROC_LOCK arguments are constructed before the >>> nsm_monitor() call, so we have to rearrange argument construction >>> very >>> slightly to make this all work out. >>> >>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>> --- >>> >>> fs/lockd/clntproc.c | 2 +- >>> fs/lockd/mon.c | 6 +++++- >>> 2 files changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c >>> index dd79570..f55b900 100644 >>> --- a/fs/lockd/clntproc.c >>> +++ b/fs/lockd/clntproc.c >>> @@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct nlm_rqst >>> *req, struct file_lock *fl) >>> struct nlm_lock *lock = &argp->lock; >>> >>> nlmclnt_next_cookie(&argp->cookie); >>> - argp->state = nsm_local_state; >>> memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode), >>> sizeof(struct nfs_fh)); >>> lock->caller = utsname()->nodename; >>> lock->oh.data = req->a_owner; >>> @@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct >>> file_lock *fl) >>> >>> if (nsm_monitor(host) < 0) >>> goto out; >>> + req->a_args.state = nsm_local_state; >> >> Hm. It looks like a_args.state is never used, except in ifdef'd-out >> code in nlm4svc_proc_lock() and nlmsvc_proc_lock() ifdef'd out. >> Something's wrong there. (Not your fault; but needs looking into.) > > This isn't a big deal on the server side (I guess I should give this > patch to Trond instead of you, in that case). > > The client passes its NSM state number to the server in NLMPROC_LOCK > calls. There is no mechanism for the server to pass its NSM state > number to the client via the NLM protocol. So the first the client is > aware of the server's NSM state number is after the server reboots (via > SM_NOTIFY). If the server never reboots, the client will never know the > server's NSM state number. So the #if 0'd code should just be deleted? --b. > >>> fl->fl_flags |= FL_ACCESS; >>> status = do_vfs_lock(fl); >>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c >>> index 6d5d4a4..5017d50 100644 >>> --- a/fs/lockd/mon.c >>> +++ b/fs/lockd/mon.c >>> @@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host) >>> status = -EIO; >>> if (status < 0) >>> printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name); >>> - else >>> + else { >>> nsm->sm_monitored = 1; >>> + nsm_local_state = res.state; >>> + dprintk("lockd: nsm_monitor: NSM state is now %d\n", >>> + nsm_local_state); >> >> Could we make that a dprintk in the case where this changes nsm_local >> state from something other than zero (nsm_lock_state && >> nsm_local_state >> != res.state)? >> >> (Just to make sure no statd is returning inconsistent nsm_local_stats >> here.) > > I'm not sure that's a big deal, but... > > Note that the XNFS version 3 spec suggests the local lockd should > request the NSM state number when it starts up by posting an > SM_UNMON_ALL to the local statd. That might be safer than loading it > after every SM_MON. > >>> + } >>> return status; >>> } >> >> --b. > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html