On Fri, Mar 8, 2013 at 2:02 AM, Ian Kent <raven@xxxxxxxxxx> wrote: > On Wed, 2013-03-06 at 21:55 -0300, Leonardo Chiquitto wrote: >> On Wed, Mar 6, 2013 at 12:07 AM, Ian Kent <raven@xxxxxxxxxx> wrote: >> > On Wed, 2013-03-06 at 09:39 +0800, Ian Kent wrote: >> >> On Tue, 2013-03-05 at 19:05 -0300, Leonardo Chiquitto wrote: >> >> > On Tue, Mar 5, 2013 at 12:54 AM, Ian Kent <raven@xxxxxxxxxx> wrote: >> >> > > On Fri, 2013-03-01 at 18:42 -0300, Leonardo Chiquitto wrote: >> >> > >> On Fri, Mar 1, 2013 at 5:44 AM, Ian Kent <raven@xxxxxxxxxx> wrote: >> >> > >> > On Wed, 2013-02-27 at 22:49 -0300, Leonardo Chiquitto wrote: >> >> > >> >> Hello, >> >> > >> >> >> >> > >> >> I've got a bug report describing a case where AutoFS fails to read new >> >> > >> >> entries from /etc/auto.master after a SIGHUP. Although the problem >> >> > >> >> was reported in an older version of automount, it is reproducible using >> >> > >> >> the latest revision from git. >> >> > >> >> >> >> > >> >> How to reproduce: >> >> > >> >> >> >> > >> >> 1. Add at least two sources of AutoFS maps to /etc/nsswitch.conf. I've >> >> > >> >> tested only with "files nis". You don't need to configure NIS/YP, just >> >> > >> >> having it listed there in the configuration is enough. >> >> > >> >> >> >> > >> >> 2. Start the automounter with a simple /etc/auto.master: >> >> > >> >> >> >> > >> >> /nfs1 /etc/auto.test1 >> >> > >> >> +auto.master >> >> > >> >> >> >> > >> >> 3. Add another entry to /etc/auto.master: >> >> > >> >> >> >> > >> >> /nfs1 /etc/auto.test1 >> >> > >> >> /nfs2 /etc/auto.test2 >> >> > >> >> +auto.master >> >> > >> >> >> >> > >> >> and reload the daemon. Notice that although AutoFS reads /etc/auto.test2, >> >> > >> >> /nfs2 is not created/mounted. >> >> > >> >> >> >> > >> >> 4. Try to stop the daemon cleanly (SIGTERM only). You'll notice that it won't >> >> > >> >> quit. SIGKILL is necessary. >> >> > >> > >> >> > >> > While the failure to exit when this occurs is a worry probably due to >> >> > >> > the presence of a master map entry in the list that was not acted upon. >> >> > >> > We probably should come up with way to produce this problem after the >> >> > >> > re-read problem is fixed so we can check what is actually happening. >> >> > >> >> >> > >> Understood. I'll try to write a patch to make it ignore these "partial" entries >> >> > >> when exiting. >> >> > >> >> >> > >> > In the mean time have a look at this patch. >> >> > >> > >> >> > >> > autofs-5.0.7 - fix map read fail incorrectly set on master re-read >> >> > >> >> >> > >> I tried it here and unfortunately I'm still able to reproduce the problem. >> >> > >> I haven't had time to debug it further yet, but my *impression* is that >> >> > >> when reading the entry "+auto.master" from /etc/auto.master it >> >> > >> fails with nss source "files" due to the recursion and sets read_fail to 1. >> >> > >> Next it tries with nss source "nis" and succeeds, lookup_nss_read_master() >> >> > >> returns SUCCESS and the "read_fail = 0" added to lookup_read_master() >> >> > >> is never executed. >> >> > > >> >> > > Right, I think that means that to reproduce the problem a second map >> >> > > source must be specified and the read must must succeed. I used a second >> >> > > source that failed when I did this but using one that succeeds does >> >> > > produce the bad behaviour. >> >> > > >> >> > > The principle is still the same though. I think it is not correct to >> >> > > return a failure for the self included file map since we essentially >> >> > > want to ignore that source. >> >> > > >> >> > > The problem of entries in the master map mounts list that haven't been >> >> > > completely setup is a bit more difficult to deal with. On the face of >> >> > > it, for indirect mounts, simply continuing to the next entry if it isn't >> >> > > mounted should be enough but I'm not yet clear on the implications for >> >> > > the direct map. >> >> > > >> >> > > How about trying this patch: >> >> > > >> >> > > autofs-5.0.7 - dont fail on master map self include >> >> > >> >> > Hello Ian, >> >> > >> >> > I just tested it and it fails differently: when reading the entry "+auto.master" >> >> > from /etc/auto.master it first tries nss source "files" and succeed (because >> >> > recursion is now handled as !failure). This makes lookup_nss_read_master() >> >> > return even earlier, not even trying the second nss source: >> >> > >> >> > 220 /* First one gets it */ >> >> > 221 head = &nsslist; >> >> > 222 list_for_each(p, head) { >> >> > >> >> > (...) >> >> > >> >> > 262 status = check_nss_result(this, result); >> >> > 263 if (status >= 0) { >> >> > 264 free_sources(&nsslist); >> >> > 265 return status; >> >> > 266 } >> >> > >> >> > In my setup, this means all maps in (NIS) auto.master are not read >> >> > when AutoFS starts. >> >> >> >> Ahh ... right, I get it. >> > >> > My mistake is returning NSS_STATUS_SUCCESS whose default nsswitch action >> > is to return. Returning a status whose default action is to continue, >> > like NSS_STATUS_TRYAGAIN, would do the same thing as your patch but >> > without introducing a status that isn't part of the nsswitch definition. >> > >> > I put quite a bit of effort into not adding status values that weren't >> > part of the definition when I did this in an attempt to minimize porting >> > effort if I decided to (or had to for some other reason) use the glibc >> > nss interface. >> > >> > While that premise is probably not relevant any more there is still a >> > remote possibility that autofs could change to using the glibc nss >> > interface. >> > >> > So, it's not that there is anything wrong with your patch, it just goes >> > outside of one of the original aims of the implementation. >> > >> > The reason autofs doesn't use glibc is also still the same, that being >> > "all" the source modules would need to be ported to glibc nss and >> > writing a simple internal parser was much less effort. >> >> Thanks for the context! >> >> > Can you try changing the return to NSS_STATUS_TRYAGAIN in the two places >> > of the original patch and try it again please. >> >> Yes, with NSS_STATUS_TRYAGAIN it works too. When I wrote the >> patch, I thought for a while about which status code to use, but no >> option looked like a perfect fit. I belive TRYAGAIN means that the caller >> should try the same NSS source again, right? So using it here would >> distort the meaning a little bit (as in "try again with the next source"). >> Nevertheless, I think this is an acceptable solution for the problem. > > That's a good point and another one of the reasons I didn't like the > glibc nss interface. When I did a trial implementation I found the > definition of what returns codes to use and how glibc nss might use them > unclear. > > I agree that NSS_STATUS_TRYAGAIN sounds like it implies try again with > the same source but there is no try next code and the unavailable code > seems even less appropriate. > > I'll go ahead and commit the changes I have queued, including this one > and your file handle leak change. Thank you! Leonardo -- To unsubscribe from this list: send the line "unsubscribe autofs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html