Re: [PATCH v5 5/5] nfsd: add the infrastructure to handle the cld upcall

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 7 Feb 2012 10:00:09 -0500

On Sat, 4 Feb 2012 06:49:06 -0500
Jeff Layton <jlayton@xxxxxxxxxx> wrote:

> On Fri, 3 Feb 2012 17:57:36 -0500
> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> 
> > On Wed, Feb 01, 2012 at 10:44:12AM -0500, Jeff Layton wrote:
> > > ...and add a mechanism for switching between the "legacy" tracker and
> > > the new one. The decision is made by looking to see whether the
> > > v4recoverydir exists. If it does, then the legacy client tracker is
> > > used.
> > > 
> > > If it's not, then the kernel will create a "cld" pipe in rpc_pipefs.
> > > That pipe is used to talk to a daemon for handling the upcall.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > ---
> > >  fs/nfsd/nfs4recover.c |  364 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  1 files changed, 363 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> > > index 3fbf1f4..592fe3d 100644
> > > --- a/fs/nfsd/nfs4recover.c
> > > +++ b/fs/nfsd/nfs4recover.c
> > > @@ -1,5 +1,6 @@
> > >  /*
> > >  *  Copyright (c) 2004 The Regents of the University of Michigan.
> > > +*  Copyright (c) 2011 Jeff Layton <jlayton@xxxxxxxxxx>
> > >  *  All rights reserved.
> > >  *
> > >  *  Andy Adamson <andros@xxxxxxxxxxxxxx>
> > > @@ -36,6 +37,10 @@
> > >  #include <linux/namei.h>
> > >  #include <linux/crypto.h>
> > >  #include <linux/sched.h>
> > > +#include <linux/fs.h>
> > > +#include <linux/sunrpc/rpc_pipe_fs.h>
> > > +#include <linux/sunrpc/clnt.h>
> > > +#include <linux/nfsd/cld.h>
> > >  
> > >  #include "nfsd.h"
> > >  #include "state.h"
> > > @@ -472,12 +477,369 @@ static struct nfsd4_client_tracking_ops nfsd4_legacy_tracking_ops = {
> > >  	.grace_done	= nfsd4_recdir_purge_old,
> > >  };
> > >  
> > > +/* Globals */
> > > +#define NFSD_PIPE_DIR		"/nfsd"
> > > +
> > > +static struct dentry *cld_pipe;
> > > +
> > > +/* list of cld_msg's that are currently in use */
> > > +static DEFINE_SPINLOCK(cld_lock);
> > > +static LIST_HEAD(cld_list);
> > > +static unsigned int cld_xid;
> > > +
> > > +struct cld_upcall {
> > > +	struct list_head	cu_list;
> > > +	struct task_struct	*cu_task;
> > > +	struct cld_msg	cu_msg;
> > > +};
> > > +
> > > +static int
> > > +__cld_pipe_upcall(struct cld_msg *cmsg)
> > > +{
> > > +	int ret;
> > > +	struct rpc_pipe_msg msg;
> > > +	struct inode *inode = cld_pipe->d_inode;
> > > +
> > > +	memset(&msg, 0, sizeof(msg));
> > > +	msg.data = cmsg;
> > > +	msg.len = sizeof(*cmsg);
> > > +
> > > +	/*
> > > +	 * Set task state before we queue the upcall. That prevents
> > > +	 * wake_up_process in the downcall from racing with schedule.
> > > +	 */
> > > +	set_current_state(TASK_UNINTERRUPTIBLE);
> > 
> > Is there a risk of nfsd being left unkillable if the daemon dies or
> > becomes unresponsive?
> > 
> 
> If the daemon dies, then the pipe will be closed and this will get
> woken back up with msg.errno set to -EAGAIN. At that point, it'll
> get requeued and then will eventually time out via the
> RPC_PIPE_WAIT_FOR_OPEN workqueue job.
> 
> If the daemon is unresponsive but holds the pipe open then this will
> sleep indefinitely. I suppose we could use a schedule_timeout() below
> with a 30s timeout or so and then dequeue it if it does. I'll note
> though that none of the other rpc_pipefs upcalls do that.
> 
> Hmm...now that I look though, I think I see a bug in the rpc_pipefs
> code. When an upcall is read off the pipe, it moves to the in_upcall
> list. If there's never a downcall for it, then I think it'll just
> linger there forever until the pipe is removed. Even closing the pipe
> won't unwedge it. It seems like we should return an error for those as
> well. I'll see about spinning up a patch for that too...
>

Ok, I had a bit more time to look over this... 

My mistake -- that last paragraph is wrong. I neglected to note that
upcalls that are partially read end up moving to filp->private_data as
well. If the pipe is closed after reading part of a call then the
upcall will end up getting an -EAGAIN back.

I think we can make these calls eventually time out if the daemon goes
unresponsive. It will take some fairly significant reengineering
however. We'll need to embed the rpc_pipe_msg inside another struct,
allocate them from the slab and refcount them, and then (carefully!)
deal with them if the schedule_timeout() times out.

The question is -- is that really worth it? None of the other rpc_pipe
upcalls do that (aside from the gssapi one). My preference would be not
to bother with it until we find that a hung daemon is a problem. That
said, if you think it's a necessary precondition to merging this then
I'll see about making those changes.

Thoughts?

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html