Re: AutoFS fails to add new entries from auto.master after SIGHUP

Leonardo Chiquitto <leonardo.lists@xxxxxxxxx> · Fri, 8 Mar 2013 09:10:42 -0300

On Fri, Mar 8, 2013 at 2:02 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
> On Wed, 2013-03-06 at 21:55 -0300, Leonardo Chiquitto wrote:
>> On Wed, Mar 6, 2013 at 12:07 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
>> > On Wed, 2013-03-06 at 09:39 +0800, Ian Kent wrote:
>> >> On Tue, 2013-03-05 at 19:05 -0300, Leonardo Chiquitto wrote:
>> >> > On Tue, Mar 5, 2013 at 12:54 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
>> >> > > On Fri, 2013-03-01 at 18:42 -0300, Leonardo Chiquitto wrote:
>> >> > >> On Fri, Mar 1, 2013 at 5:44 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
>> >> > >> > On Wed, 2013-02-27 at 22:49 -0300, Leonardo Chiquitto wrote:
>> >> > >> >> Hello,
>> >> > >> >>
>> >> > >> >> I've got a bug report describing a case where AutoFS fails to read new
>> >> > >> >> entries from /etc/auto.master after a SIGHUP. Although the problem
>> >> > >> >> was reported in an older version of automount, it is reproducible using
>> >> > >> >> the latest revision from git.
>> >> > >> >>
>> >> > >> >> How to reproduce:
>> >> > >> >>
>> >> > >> >> 1. Add at least two sources of AutoFS maps to /etc/nsswitch.conf. I've
>> >> > >> >> tested only with "files nis". You don't need to configure NIS/YP, just
>> >> > >> >> having it listed there in the configuration is enough.
>> >> > >> >>
>> >> > >> >> 2. Start the automounter with a simple /etc/auto.master:
>> >> > >> >>
>> >> > >> >> /nfs1 /etc/auto.test1
>> >> > >> >> +auto.master
>> >> > >> >>
>> >> > >> >> 3. Add another entry to /etc/auto.master:
>> >> > >> >>
>> >> > >> >> /nfs1 /etc/auto.test1
>> >> > >> >> /nfs2 /etc/auto.test2
>> >> > >> >> +auto.master
>> >> > >> >>
>> >> > >> >> and reload the daemon. Notice that although AutoFS reads /etc/auto.test2,
>> >> > >> >> /nfs2 is not created/mounted.
>> >> > >> >>
>> >> > >> >> 4. Try to stop the daemon cleanly (SIGTERM only). You'll notice that it won't
>> >> > >> >> quit. SIGKILL is necessary.
>> >> > >> >
>> >> > >> > While the failure to exit when this occurs is a worry probably due to
>> >> > >> > the presence of a master map entry in the list that was not acted upon.
>> >> > >> > We probably should come up with way to produce this problem after the
>> >> > >> > re-read problem is fixed so we can check what is actually happening.
>> >> > >>
>> >> > >> Understood. I'll try to write a patch to make it ignore these "partial" entries
>> >> > >> when exiting.
>> >> > >>
>> >> > >> > In the mean time have a look at this patch.
>> >> > >> >
>> >> > >> > autofs-5.0.7 - fix map read fail incorrectly set on master re-read
>> >> > >>
>> >> > >> I tried it here and unfortunately I'm still able to reproduce the problem.
>> >> > >> I haven't had time to debug it further yet, but my *impression* is that
>> >> > >> when reading the entry "+auto.master" from /etc/auto.master it
>> >> > >> fails with nss source "files" due to the recursion and sets read_fail to 1.
>> >> > >> Next it tries with nss source "nis" and succeeds, lookup_nss_read_master()
>> >> > >> returns SUCCESS and the "read_fail = 0" added to lookup_read_master()
>> >> > >> is never executed.
>> >> > >
>> >> > > Right, I think that means that to reproduce the problem a second map
>> >> > > source must be specified and the read must must succeed. I used a second
>> >> > > source that failed when I did this but using one that succeeds does
>> >> > > produce the bad behaviour.
>> >> > >
>> >> > > The principle is still the same though. I think it is not correct to
>> >> > > return a failure for the self included file map since we essentially
>> >> > > want to ignore that source.
>> >> > >
>> >> > > The problem of entries in the master map mounts list that haven't been
>> >> > > completely setup is a bit more difficult to deal with. On the face of
>> >> > > it, for indirect mounts, simply continuing to the next entry if it isn't
>> >> > > mounted should be enough but I'm not yet clear on the implications for
>> >> > > the direct map.
>> >> > >
>> >> > > How about trying this patch:
>> >> > >
>> >> > > autofs-5.0.7 - dont fail on master map self include
>> >> >
>> >> > Hello Ian,
>> >> >
>> >> > I just tested it and it fails differently: when reading the entry "+auto.master"
>> >> > from /etc/auto.master it first tries nss source "files" and succeed (because
>> >> > recursion is now handled as !failure). This makes lookup_nss_read_master()
>> >> > return even earlier, not even trying the second nss source:
>> >> >
>> >> >  220         /* First one gets it */
>> >> >  221         head = &nsslist;
>> >> >  222         list_for_each(p, head) {
>> >> >
>> >> > (...)
>> >> >
>> >> >  262                 status = check_nss_result(this, result);
>> >> >  263                 if (status >= 0) {
>> >> >  264                         free_sources(&nsslist);
>> >> >  265                         return status;
>> >> >  266                 }
>> >> >
>> >> > In my setup, this means all maps in (NIS) auto.master are not read
>> >> > when AutoFS starts.
>> >>
>> >> Ahh ... right, I get it.
>> >
>> > My mistake is returning NSS_STATUS_SUCCESS whose default nsswitch action
>> > is to return. Returning a status whose default action is to continue,
>> > like NSS_STATUS_TRYAGAIN, would do the same thing as your patch but
>> > without introducing a status that isn't part of the nsswitch definition.
>> >
>> > I put quite a bit of effort into not adding status values that weren't
>> > part of the definition when I did this in an attempt to minimize porting
>> > effort if I decided to (or had to for some other reason) use the glibc
>> > nss interface.
>> >
>> > While that premise is probably not relevant any more there is still a
>> > remote possibility that autofs could change to using the glibc nss
>> > interface.
>> >
>> > So, it's not that there is anything wrong with your patch, it just goes
>> > outside of one of the original aims of the implementation.
>> >
>> > The reason autofs doesn't use glibc is also still the same, that being
>> > "all" the source modules would need to be ported to glibc nss and
>> > writing a simple internal parser was much less effort.
>>
>> Thanks for the context!
>>
>> > Can you try changing the return to NSS_STATUS_TRYAGAIN in the two places
>> > of the original patch and try it again please.
>>
>> Yes, with NSS_STATUS_TRYAGAIN it works too. When I wrote the
>> patch, I thought for a while about which status code to use, but no
>> option looked like a perfect fit. I belive TRYAGAIN means that the caller
>> should try the same NSS source again, right? So using it here would
>> distort the meaning a little bit (as in "try again with the next source").
>> Nevertheless, I think this is an acceptable solution for the problem.
>
> That's a good point and another one of the reasons I didn't like the
> glibc nss interface. When I did a trial implementation I found the
> definition of what returns codes to use and how glibc nss might use them
> unclear.
>
> I agree that NSS_STATUS_TRYAGAIN sounds like it implies try again with
> the same source but there is no try next code and the unavailable code
> seems even less appropriate.
>
> I'll go ahead and commit the changes I have queued, including this one
> and your file handle leak change.

Thank you!

Leonardo
--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html