On Thu, 26 Jan 2012 08:28:30 -0500 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Thu, 26 Jan 2012 07:47:51 -0500 > Steve Dickson <SteveD@xxxxxxxxxx> wrote: > > > > > > > On 01/25/2012 06:32 PM, Jeff Layton wrote: > > > On Wed, 25 Jan 2012 17:04:44 -0500 > > > Steve Dickson <SteveD@xxxxxxxxxx> wrote: > > > > > >> > > >> > > >> On 01/25/2012 03:28 PM, Jeff Layton wrote: > > >>> On Wed, 25 Jan 2012 14:31:10 -0500 > > >>> Steve Dickson <SteveD@xxxxxxxxxx> wrote: > > >>> > > >>>> > > >>>> > > >>>> On 01/25/2012 02:09 PM, Jeff Layton wrote: > > >>>>> On Wed, 25 Jan 2012 13:16:24 -0500 > > >>>>> Steve Dickson <SteveD@xxxxxxxxxx> wrote: > > >>>>> > > >>>>>> Hey Jeff, > > >>>>>> > > >>>>>> Commit inline... > > >>>>>> > > >>>>>> On 01/23/2012 03:02 PM, Jeff Layton wrote: > > >>>>>>> This can happen if nfsd is shut down and restarted. If that occurs, > > >>>>>>> then reopen the pipe so we're not waiting for data on the defunct > > >>>>>>> pipe. > > >>>>>>> > > >>>>>>> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > >>>>>>> --- > > >>>>>>> utils/nfsdcld/nfsdcld.c | 84 +++++++++++++++++++++++++++++++++++++++++----- > > >>>>>>> 1 files changed, 74 insertions(+), 10 deletions(-) > > >>>>>>> > > >>>>>>> diff --git a/utils/nfsdcld/nfsdcld.c b/utils/nfsdcld/nfsdcld.c > > >>>>>>> index b0c08e2..0dc5b37 100644 > > >>>>>>> --- a/utils/nfsdcld/nfsdcld.c > > >>>>>>> +++ b/utils/nfsdcld/nfsdcld.c > > >>>>>>> @@ -57,6 +57,8 @@ struct cld_client { > > >>>>>>> > > >>>>>>> /* global variables */ > > >>>>>>> static char *pipepath = DEFAULT_CLD_PATH; > > >>>>>>> +static int inotify_fd = -1; > > >>>>>>> +static struct event pipedir_event; > > >>>>>>> > > >>>>>>> static struct option longopts[] = > > >>>>>>> { > > >>>>>>> @@ -68,8 +70,10 @@ static struct option longopts[] = > > >>>>>>> { NULL, 0, 0, 0 }, > > >>>>>>> }; > > >>>>>>> > > >>>>>>> + > > >>>>>>> /* forward declarations */ > > >>>>>>> static void cldcb(int UNUSED(fd), short which, void *data); > > >>>>>>> +static void cld_pipe_reopen(struct cld_client *clnt); > > >>>>>>> > > >>>>>>> static void > > >>>>>>> usage(char *progname) > > >>>>>>> @@ -80,10 +84,62 @@ usage(char *progname) > > >>>>>>> > > >>>>>>> #define INOTIFY_EVENT_MAX (sizeof(struct inotify_event) + NAME_MAX) > > >>>>>>> > > >>>>>>> +static void > > >>>>>>> +cld_inotify_cb(int UNUSED(fd), short which, void *data) > > >>>>>>> +{ > > >>>>>>> + int ret, oldfd; > > >>>>>>> + char evbuf[INOTIFY_EVENT_MAX]; > > >>>>>>> + char *dirc = NULL, *pname; > > >>>>>>> + struct inotify_event *event = (struct inotify_event *)evbuf; > > >>>>>>> + struct cld_client *clnt = data; > > >>>>>>> + > > >>>>>>> + if (which != EV_READ) > > >>>>>>> + return; > > >>>>>>> + > > >>>>>>> + dirc = strndup(pipepath, PATH_MAX); > > >>>>>>> + if (!dirc) { > > >>>>>>> + xlog_err("%s: unable to allocate memory", __func__); > > >>>>>>> + goto out; > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + ret = read(inotify_fd, evbuf, INOTIFY_EVENT_MAX); > > >>>>>>> + if (ret < 0) { > > >>>>>>> + xlog_err("%s: read from inotify fd failed: %m", __func__); > > >>>>>>> + goto out; > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + /* check to see if we have a filename in the evbuf */ > > >>>>>>> + if (!event->len) > > >>>>>>> + goto out; > > >>>>>>> + > > >>>>>>> + pname = basename(dirc); > > >>>>>>> + > > >>>>>>> + /* does the filename match our pipe? */ > > >>>>>>> + if (strncmp(pname, event->name, event->len)) > > >>>>>>> + goto out; > > >>>>>>> + > > >>>>>>> + /* > > >>>>>>> + * reopen the pipe. The old fd is not closed until the new one is > > >>>>>>> + * opened, so we know they should be different if the reopen is > > >>>>>>> + * successful. > > >>>>>>> + */ > > >>>>>>> + oldfd = clnt->cl_fd; > > >>>>>>> + do { > > >>>>>>> + cld_pipe_reopen(clnt); > > >>>>>>> + } while (oldfd == clnt->cl_fd); > > >>>>>> Doesn't this have a potential for an infinite loop? > > >>>>>> > > >>>>>> steved. > > >>>>> > > >>>>> > > >>>>> Yes. If reopening the new pipe continually fails then it will loop > > >>>>> forever. > > >>>> Would it be more accurate to say it would be spinning forever? > > >>>> Since there is no sleep or delay in cld_pipe_reopen, what's > > >>>> going to stop the daemon from spinning in a CPU bound loop? > > >>>> > > >>> > > >>> Well, not spinning in a userspace loop...it'll continually be cycling on > > >>> an open() call that's not working for whatever reason. We sort of have > > >>> to loop on that though. I think the best we can do is add a sleep(1) in > > >>> there or something. Would that be sufficient? > > >>> > > >> I still think it going to needlessly suck up CPU cycles... > > >> > > >> The way I handled this in the rpc.idmapd daemon was to do the > > >> reopen on a SIGHUP signal. Then in NFS server initscript > > >> I did the following: > > >> /usr/bin/pkill -HUP rpc.idmapd > > >> > > >> Thoughts? > > >> > > > > > > Ugh, that requires manual intervention if the pipe is removed and > > > recreated. If someone restarts nfsd and doesn't send the signal, then > > > they won't get the upcalls. I'd prefer something that "just works". > > I have not seen any bz open saying rpc.idmapd doesn't just work... > > > > > > > > Seriously, is it that big a deal to just loop here? One open(2) call > > > every second doesn't seem that bad, and honestly if a new pipe pops up > > > and the daemon can't open it then a few CPU cycles is the least of your > > > worries. > > > > > Put the daemon in that loop and then run the top command in another > > window.. If the daemon is at the top of the list then it is a big > > deal because that daemon will on the top forever for no reason, in > > the cast of the NFS server not coming back. > > > > This situation is really unlikely. The daemon does not reopen the pipe > when the old one goes away. It reopens it when a new one with the same > name is recreated in the directory. > > That's an important distinction because in order to get into this loop, > you'd need to: > > 1/ remove the old pipe -- this happens when the daemon is shut down > > 2/ create a new pipe -- this happens when the daemon is restarted > To clarify, the above happen when knfsd are stopped and started... > 3/ not be able to open the new pipe for some reason, even though you > were able to open the old one > > The reason I put this in a loop is because it's possible (though not > likely) that you'd hit condition #3 temporarily. In that event, looping > and retrying an open(2) call every second seems entirely reasonable and > is more fault tolerant than just dying here. The open of a pipe takes > much less than 1s, so there's plenty of time between open attempts for > the machine to get other things done. > > If it turns out that there's a problem, the admin can shut down the > daemon at that point. They may need to do so anyway in order to resolve > the situation if the thing preventing the opening of the pipe isn't > temporary. > -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html