Re: AutoFS fails to add new entries from auto.master after SIGHUP

Ian Kent <raven@xxxxxxxxxx> · Fri, 08 Mar 2013 13:02:32 +0800

On Wed, 2013-03-06 at 21:55 -0300, Leonardo Chiquitto wrote:
> On Wed, Mar 6, 2013 at 12:07 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
> > On Wed, 2013-03-06 at 09:39 +0800, Ian Kent wrote:
> >> On Tue, 2013-03-05 at 19:05 -0300, Leonardo Chiquitto wrote:
> >> > On Tue, Mar 5, 2013 at 12:54 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
> >> > > On Fri, 2013-03-01 at 18:42 -0300, Leonardo Chiquitto wrote:
> >> > >> On Fri, Mar 1, 2013 at 5:44 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
> >> > >> > On Wed, 2013-02-27 at 22:49 -0300, Leonardo Chiquitto wrote:
> >> > >> >> Hello,
> >> > >> >>
> >> > >> >> I've got a bug report describing a case where AutoFS fails to read new
> >> > >> >> entries from /etc/auto.master after a SIGHUP. Although the problem
> >> > >> >> was reported in an older version of automount, it is reproducible using
> >> > >> >> the latest revision from git.
> >> > >> >>
> >> > >> >> How to reproduce:
> >> > >> >>
> >> > >> >> 1. Add at least two sources of AutoFS maps to /etc/nsswitch.conf. I've
> >> > >> >> tested only with "files nis". You don't need to configure NIS/YP, just
> >> > >> >> having it listed there in the configuration is enough.
> >> > >> >>
> >> > >> >> 2. Start the automounter with a simple /etc/auto.master:
> >> > >> >>
> >> > >> >> /nfs1 /etc/auto.test1
> >> > >> >> +auto.master
> >> > >> >>
> >> > >> >> 3. Add another entry to /etc/auto.master:
> >> > >> >>
> >> > >> >> /nfs1 /etc/auto.test1
> >> > >> >> /nfs2 /etc/auto.test2
> >> > >> >> +auto.master
> >> > >> >>
> >> > >> >> and reload the daemon. Notice that although AutoFS reads /etc/auto.test2,
> >> > >> >> /nfs2 is not created/mounted.
> >> > >> >>
> >> > >> >> 4. Try to stop the daemon cleanly (SIGTERM only). You'll notice that it won't
> >> > >> >> quit. SIGKILL is necessary.
> >> > >> >
> >> > >> > While the failure to exit when this occurs is a worry probably due to
> >> > >> > the presence of a master map entry in the list that was not acted upon.
> >> > >> > We probably should come up with way to produce this problem after the
> >> > >> > re-read problem is fixed so we can check what is actually happening.
> >> > >>
> >> > >> Understood. I'll try to write a patch to make it ignore these "partial" entries
> >> > >> when exiting.
> >> > >>
> >> > >> > In the mean time have a look at this patch.
> >> > >> >
> >> > >> > autofs-5.0.7 - fix map read fail incorrectly set on master re-read
> >> > >>
> >> > >> I tried it here and unfortunately I'm still able to reproduce the problem.
> >> > >> I haven't had time to debug it further yet, but my *impression* is that
> >> > >> when reading the entry "+auto.master" from /etc/auto.master it
> >> > >> fails with nss source "files" due to the recursion and sets read_fail to 1.
> >> > >> Next it tries with nss source "nis" and succeeds, lookup_nss_read_master()
> >> > >> returns SUCCESS and the "read_fail = 0" added to lookup_read_master()
> >> > >> is never executed.
> >> > >
> >> > > Right, I think that means that to reproduce the problem a second map
> >> > > source must be specified and the read must must succeed. I used a second
> >> > > source that failed when I did this but using one that succeeds does
> >> > > produce the bad behaviour.
> >> > >
> >> > > The principle is still the same though. I think it is not correct to
> >> > > return a failure for the self included file map since we essentially
> >> > > want to ignore that source.
> >> > >
> >> > > The problem of entries in the master map mounts list that haven't been
> >> > > completely setup is a bit more difficult to deal with. On the face of
> >> > > it, for indirect mounts, simply continuing to the next entry if it isn't
> >> > > mounted should be enough but I'm not yet clear on the implications for
> >> > > the direct map.
> >> > >
> >> > > How about trying this patch:
> >> > >
> >> > > autofs-5.0.7 - dont fail on master map self include
> >> >
> >> > Hello Ian,
> >> >
> >> > I just tested it and it fails differently: when reading the entry "+auto.master"
> >> > from /etc/auto.master it first tries nss source "files" and succeed (because
> >> > recursion is now handled as !failure). This makes lookup_nss_read_master()
> >> > return even earlier, not even trying the second nss source:
> >> >
> >> >  220         /* First one gets it */
> >> >  221         head = &nsslist;
> >> >  222         list_for_each(p, head) {
> >> >
> >> > (...)
> >> >
> >> >  262                 status = check_nss_result(this, result);
> >> >  263                 if (status >= 0) {
> >> >  264                         free_sources(&nsslist);
> >> >  265                         return status;
> >> >  266                 }
> >> >
> >> > In my setup, this means all maps in (NIS) auto.master are not read
> >> > when AutoFS starts.
> >>
> >> Ahh ... right, I get it.
> >
> > My mistake is returning NSS_STATUS_SUCCESS whose default nsswitch action
> > is to return. Returning a status whose default action is to continue,
> > like NSS_STATUS_TRYAGAIN, would do the same thing as your patch but
> > without introducing a status that isn't part of the nsswitch definition.
> >
> > I put quite a bit of effort into not adding status values that weren't
> > part of the definition when I did this in an attempt to minimize porting
> > effort if I decided to (or had to for some other reason) use the glibc
> > nss interface.
> >
> > While that premise is probably not relevant any more there is still a
> > remote possibility that autofs could change to using the glibc nss
> > interface.
> >
> > So, it's not that there is anything wrong with your patch, it just goes
> > outside of one of the original aims of the implementation.
> >
> > The reason autofs doesn't use glibc is also still the same, that being
> > "all" the source modules would need to be ported to glibc nss and
> > writing a simple internal parser was much less effort.
> 
> Thanks for the context!
> 
> > Can you try changing the return to NSS_STATUS_TRYAGAIN in the two places
> > of the original patch and try it again please.
> 
> Yes, with NSS_STATUS_TRYAGAIN it works too. When I wrote the
> patch, I thought for a while about which status code to use, but no
> option looked like a perfect fit. I belive TRYAGAIN means that the caller
> should try the same NSS source again, right? So using it here would
> distort the meaning a little bit (as in "try again with the next source").
> Nevertheless, I think this is an acceptable solution for the problem.

That's a good point and another one of the reasons I didn't like the
glibc nss interface. When I did a trial implementation I found the
definition of what returns codes to use and how glibc nss might use them
unclear.

I agree that NSS_STATUS_TRYAGAIN sounds like it implies try again with
the same source but there is no try next code and the unavailable code
seems even less appropriate.

I'll go ahead and commit the changes I have queued, including this one
and your file handle leak change.

Ian

--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html