Re: [PATCH v4 09/11] nfsdcld: reopen pipe if it's deleted and recreated

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 26 Jan 2012 08:28:30 -0500

On Thu, 26 Jan 2012 07:47:51 -0500
Steve Dickson <SteveD@xxxxxxxxxx> wrote:

> 
> 
> On 01/25/2012 06:32 PM, Jeff Layton wrote:
> > On Wed, 25 Jan 2012 17:04:44 -0500
> > Steve Dickson <SteveD@xxxxxxxxxx> wrote:
> > 
> >>
> >>
> >> On 01/25/2012 03:28 PM, Jeff Layton wrote:
> >>> On Wed, 25 Jan 2012 14:31:10 -0500
> >>> Steve Dickson <SteveD@xxxxxxxxxx> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 01/25/2012 02:09 PM, Jeff Layton wrote:
> >>>>> On Wed, 25 Jan 2012 13:16:24 -0500
> >>>>> Steve Dickson <SteveD@xxxxxxxxxx> wrote:
> >>>>>
> >>>>>> Hey Jeff,
> >>>>>>
> >>>>>> Commit inline... 
> >>>>>>
> >>>>>> On 01/23/2012 03:02 PM, Jeff Layton wrote:
> >>>>>>> This can happen if nfsd is shut down and restarted. If that occurs,
> >>>>>>> then reopen the pipe so we're not waiting for data on the defunct
> >>>>>>> pipe.
> >>>>>>>
> >>>>>>> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> >>>>>>> ---
> >>>>>>>  utils/nfsdcld/nfsdcld.c |   84 +++++++++++++++++++++++++++++++++++++++++-----
> >>>>>>>  1 files changed, 74 insertions(+), 10 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/utils/nfsdcld/nfsdcld.c b/utils/nfsdcld/nfsdcld.c
> >>>>>>> index b0c08e2..0dc5b37 100644
> >>>>>>> --- a/utils/nfsdcld/nfsdcld.c
> >>>>>>> +++ b/utils/nfsdcld/nfsdcld.c
> >>>>>>> @@ -57,6 +57,8 @@ struct cld_client {
> >>>>>>>  
> >>>>>>>  /* global variables */
> >>>>>>>  static char *pipepath = DEFAULT_CLD_PATH;
> >>>>>>> +static int 		inotify_fd = -1;
> >>>>>>> +static struct event	pipedir_event;
> >>>>>>>  
> >>>>>>>  static struct option longopts[] =
> >>>>>>>  {
> >>>>>>> @@ -68,8 +70,10 @@ static struct option longopts[] =
> >>>>>>>  	{ NULL, 0, 0, 0 },
> >>>>>>>  };
> >>>>>>>  
> >>>>>>> +
> >>>>>>>  /* forward declarations */
> >>>>>>>  static void cldcb(int UNUSED(fd), short which, void *data);
> >>>>>>> +static void cld_pipe_reopen(struct cld_client *clnt);
> >>>>>>>  
> >>>>>>>  static void
> >>>>>>>  usage(char *progname)
> >>>>>>> @@ -80,10 +84,62 @@ usage(char *progname)
> >>>>>>>  
> >>>>>>>  #define INOTIFY_EVENT_MAX (sizeof(struct inotify_event) + NAME_MAX)
> >>>>>>>  
> >>>>>>> +static void
> >>>>>>> +cld_inotify_cb(int UNUSED(fd), short which, void *data)
> >>>>>>> +{
> >>>>>>> +	int ret, oldfd;
> >>>>>>> +	char evbuf[INOTIFY_EVENT_MAX];
> >>>>>>> +	char *dirc = NULL, *pname;
> >>>>>>> +	struct inotify_event *event = (struct inotify_event *)evbuf;
> >>>>>>> +	struct cld_client *clnt = data;
> >>>>>>> +
> >>>>>>> +	if (which != EV_READ)
> >>>>>>> +		return;
> >>>>>>> +
> >>>>>>> +	dirc = strndup(pipepath, PATH_MAX);
> >>>>>>> +	if (!dirc) {
> >>>>>>> +		xlog_err("%s: unable to allocate memory", __func__);
> >>>>>>> +		goto out;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	ret = read(inotify_fd, evbuf, INOTIFY_EVENT_MAX);
> >>>>>>> +	if (ret < 0) {
> >>>>>>> +		xlog_err("%s: read from inotify fd failed: %m", __func__);
> >>>>>>> +		goto out;
> >>>>>>> +	}
> >>>>>>> +
> >>>>>>> +	/* check to see if we have a filename in the evbuf */
> >>>>>>> +	if (!event->len)
> >>>>>>> +		goto out;
> >>>>>>> +
> >>>>>>> +	pname = basename(dirc);
> >>>>>>> +
> >>>>>>> +	/* does the filename match our pipe? */
> >>>>>>> +	if (strncmp(pname, event->name, event->len))
> >>>>>>> +		goto out;
> >>>>>>> +
> >>>>>>> +	/*
> >>>>>>> +	 * reopen the pipe. The old fd is not closed until the new one is
> >>>>>>> +	 * opened, so we know they should be different if the reopen is
> >>>>>>> +	 * successful.
> >>>>>>> +	 */
> >>>>>>> +	oldfd = clnt->cl_fd;
> >>>>>>> +	do {
> >>>>>>> +		cld_pipe_reopen(clnt);
> >>>>>>> +	} while (oldfd == clnt->cl_fd);
> >>>>>> Doesn't this have a potential for an infinite loop? 
> >>>>>>
> >>>>>> steved.  
> >>>>>
> >>>>>
> >>>>> Yes. If reopening the new pipe continually fails then it will loop
> >>>>> forever.
> >>>> Would it be more accurate to say it would be spinning forever? 
> >>>> Since there is no sleep or delay in cld_pipe_reopen, what's
> >>>> going to stop the daemon from spinning in a CPU bound loop?
> >>>>
> >>>
> >>> Well, not spinning in a userspace loop...it'll continually be cycling on
> >>> an open() call that's not working for whatever reason. We sort of have
> >>> to loop on that though. I think the best we can do is add a sleep(1) in
> >>> there or something. Would that be sufficient?
> >>>
> >> I still think it going to needlessly suck up CPU cycles... 
> >>
> >> The way I handled this in the rpc.idmapd daemon was to do the
> >> reopen on a SIGHUP signal. Then in NFS server initscript 
> >> I did the following:
> >>     /usr/bin/pkill -HUP rpc.idmapd
> >>
> >> Thoughts?
> >>
> > 
> > Ugh, that requires manual intervention if the pipe is removed and
> > recreated. If someone restarts nfsd and doesn't send the signal, then
> > they won't get the upcalls. I'd prefer something that "just works".
> I have not seen any bz open saying rpc.idmapd doesn't just work... 
> 
> > 
> > Seriously, is it that big a deal to just loop here? One open(2) call
> > every second doesn't seem that bad, and honestly if a new pipe pops up
> > and the daemon can't open it then a few CPU cycles is the least of your
> > worries.
> > 
> Put the daemon in that loop and then run the top command in another 
> window.. If the daemon is at the top of the list then it is a big
> deal because that daemon will on the top forever for no reason, in
> the cast of the NFS server not coming back. 
> 

This situation is really unlikely. The daemon does not reopen the pipe
when the old one goes away. It reopens it when a new one with the same
name is recreated in the directory.

That's an important distinction because in order to get into this loop,
you'd need to:

1/ remove the old pipe -- this happens when the daemon is shut down

2/ create a new pipe -- this happens when the daemon is restarted

3/ not be able to open the new pipe for some reason, even though you
were able to open the old one

The reason I put this in a loop is because it's possible (though not
likely) that you'd hit condition #3 temporarily. In that event, looping
and retrying an open(2) call every second seems entirely reasonable and
is more fault tolerant than just dying here. The open of a pipe takes
much less than 1s, so there's plenty of time between open attempts for
the machine to get other things done.

If it turns out that there's a problem, the admin can shut down the
daemon at that point. They may need to do so anyway in order to resolve
the situation if the thing preventing the opening of the pipe isn't
temporary.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html