On Sat, 4 Feb 2012 06:49:06 -0500 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Fri, 3 Feb 2012 17:57:36 -0500 > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote: > > > On Wed, Feb 01, 2012 at 10:44:12AM -0500, Jeff Layton wrote: > > > ...and add a mechanism for switching between the "legacy" tracker and > > > the new one. The decision is made by looking to see whether the > > > v4recoverydir exists. If it does, then the legacy client tracker is > > > used. > > > > > > If it's not, then the kernel will create a "cld" pipe in rpc_pipefs. > > > That pipe is used to talk to a daemon for handling the upcall. > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > --- > > > fs/nfsd/nfs4recover.c | 364 ++++++++++++++++++++++++++++++++++++++++++++++++- > > > 1 files changed, 363 insertions(+), 1 deletions(-) > > > > > > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c > > > index 3fbf1f4..592fe3d 100644 > > > --- a/fs/nfsd/nfs4recover.c > > > +++ b/fs/nfsd/nfs4recover.c > > > @@ -1,5 +1,6 @@ > > > /* > > > * Copyright (c) 2004 The Regents of the University of Michigan. > > > +* Copyright (c) 2011 Jeff Layton <jlayton@xxxxxxxxxx> > > > * All rights reserved. > > > * > > > * Andy Adamson <andros@xxxxxxxxxxxxxx> > > > @@ -36,6 +37,10 @@ > > > #include <linux/namei.h> > > > #include <linux/crypto.h> > > > #include <linux/sched.h> > > > +#include <linux/fs.h> > > > +#include <linux/sunrpc/rpc_pipe_fs.h> > > > +#include <linux/sunrpc/clnt.h> > > > +#include <linux/nfsd/cld.h> > > > > > > #include "nfsd.h" > > > #include "state.h" > > > @@ -472,12 +477,369 @@ static struct nfsd4_client_tracking_ops nfsd4_legacy_tracking_ops = { > > > .grace_done = nfsd4_recdir_purge_old, > > > }; > > > > > > +/* Globals */ > > > +#define NFSD_PIPE_DIR "/nfsd" > > > + > > > +static struct dentry *cld_pipe; > > > + > > > +/* list of cld_msg's that are currently in use */ > > > +static DEFINE_SPINLOCK(cld_lock); > > > +static LIST_HEAD(cld_list); > > > +static unsigned int cld_xid; > > > + > > > +struct cld_upcall { > > > + struct list_head cu_list; > > > + struct task_struct *cu_task; > > > + struct cld_msg cu_msg; > > > +}; > > > + > > > +static int > > > +__cld_pipe_upcall(struct cld_msg *cmsg) > > > +{ > > > + int ret; > > > + struct rpc_pipe_msg msg; > > > + struct inode *inode = cld_pipe->d_inode; > > > + > > > + memset(&msg, 0, sizeof(msg)); > > > + msg.data = cmsg; > > > + msg.len = sizeof(*cmsg); > > > + > > > + /* > > > + * Set task state before we queue the upcall. That prevents > > > + * wake_up_process in the downcall from racing with schedule. > > > + */ > > > + set_current_state(TASK_UNINTERRUPTIBLE); > > > > Is there a risk of nfsd being left unkillable if the daemon dies or > > becomes unresponsive? > > > > If the daemon dies, then the pipe will be closed and this will get > woken back up with msg.errno set to -EAGAIN. At that point, it'll > get requeued and then will eventually time out via the > RPC_PIPE_WAIT_FOR_OPEN workqueue job. > > If the daemon is unresponsive but holds the pipe open then this will > sleep indefinitely. I suppose we could use a schedule_timeout() below > with a 30s timeout or so and then dequeue it if it does. I'll note > though that none of the other rpc_pipefs upcalls do that. > > Hmm...now that I look though, I think I see a bug in the rpc_pipefs > code. When an upcall is read off the pipe, it moves to the in_upcall > list. If there's never a downcall for it, then I think it'll just > linger there forever until the pipe is removed. Even closing the pipe > won't unwedge it. It seems like we should return an error for those as > well. I'll see about spinning up a patch for that too... > Ok, I had a bit more time to look over this... My mistake -- that last paragraph is wrong. I neglected to note that upcalls that are partially read end up moving to filp->private_data as well. If the pipe is closed after reading part of a call then the upcall will end up getting an -EAGAIN back. I think we can make these calls eventually time out if the daemon goes unresponsive. It will take some fairly significant reengineering however. We'll need to embed the rpc_pipe_msg inside another struct, allocate them from the slab and refcount them, and then (carefully!) deal with them if the schedule_timeout() times out. The question is -- is that really worth it? None of the other rpc_pipe upcalls do that (aside from the gssapi one). My preference would be not to bother with it until we find that a hung daemon is a problem. That said, if you think it's a necessary precondition to merging this then I'll see about making those changes. Thoughts? -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html