Re: [PATCH] Test for ldap connectivity during lookup_ldap initialization

Ian Kent <raven@xxxxxxxxxx> · Wed, 16 Aug 2017 17:39:01 +0800

On 16/08/17 16:25, Ian Kent wrote:
> On 16/08/17 15:21, Oscar Salvador wrote:
>>
>>
>> On 08/16/2017 07:46 AM, Ian Kent wrote:
>>> On 16/08/17 13:02, Ian Kent wrote:
>>>> On 11/08/17 05:15, Oscar Salvador wrote:
>>>>> Right now we are just checking and parsing the ldap's configuration
>>>>> in lookup_init() function, and if everything goes well we return a zero
>>>>> value indicating that everything worked fine.
>>>>>
>>>>> This has a problem, because in the case we set ldap method in
>>>>> /etc/nsswitch.conf, configuration parsing goes fine but ldap is not working,
>>>>> we will return a zero value in lookup_init() function, so autofs will move on,
>>>>> but then we will return NSS_STATUS_UNAVAIL in lookup_read_master() function
>>>>> once do_reconnect() fails, and because of this autofs will not be able to
>>>>> reread the maps.
>>>>
>>>> Umm ... I can't see a check in the current upstream NIS code?
>>>
>>> I also don't understand the problem in the upstream code your trying to
>>> resolve. Can you try and describe it again please?
>>
>>
>> Sure, I'm going to try to explain it, hopefully better:
>>
>> Let's say we set "files" and "nis" method in /etc/nsswitch.conf, just like this:
>>
>> # grep auto /etc/nsswitch.conf
>> automount:    files nis
>>
>>
>> And then let's say we have this configuration:
>>
>> # grep -v '^#' /etc/auto.master
>> /- auto.host
>> +auto.master
>>
>> # cat /etc/auto.host
>> /data1  xx.xx.xx.xx:/mnt/export1
>> /data2  xx.xx.xx.xx:/mnt/export2
>> /data3    xx.xx.xx.xx:/mnt/export3
>>
>>
>> Once we start autofs, we should have the mounts:
>>
>> auto.host on /data3 type autofs (rw,relatime,fd=7,pgrp=2128,timeout=300,minproto=5,maxproto=5,direct)
>> auto.host on /data2 type autofs (rw,relatime,fd=7,pgrp=2128,timeout=300,minproto=5,maxproto=5,direct)
>> auto.host on /data1 type autofs (rw,relatime,fd=7,pgrp=2128,timeout=300,minproto=5,maxproto=5,direct)
>>
>>
>> Then, let's say we want to remove one mount, so we comment one of the mounts in /etc/auto.host, and then we send the
>> -HUP signal to the autofs process.
>>
>> The result of signaling autofs -HUP is that it re-reads the config.
>>
>> So, we call:
>>
>> lookup_nss_read_master() -> do_read_master() -> open_lookup() ->
>>
>> fails?
>>      |->yes: return NSS_STATUS_NOTFOUND
>>      |-> no: lookup->lookup_read_master()
>>
>>
>> The point of open_lookup() is to try to initialize the library by calling mod->lookup_init() (nis, ldap, ...),
>> and in case it fails, we return NSS_STATUS_NOTFOUND, and do_read_master() returns NSS_STATUS_NOTFOUND too.
>>
>> This is possible because in lookup_init() for lookup_yp.c we check if we are bound to a domain:
>>
>>     if (!ctxt->domainname) {
>>                 char *domainname;
>>                 /* This should, but doesn't, take a const char ** */
>>                 err = yp_get_default_domain(&domainname);
> 
> I didn't check if that returns bound domain of configured domain.
> I'll do that.

yp_get_default_domain() doesn't do a network connection, it just gets the
system configured NIS domain name, it can't be relied upon to check
connectivity.

> 
>>                 if (err) {
>>                         logerr(MODPREFIX
>>                               "map %s: %s", ctxt->mapname, yperr_string(err));
>>                         ret = 1;
>>                         goto out;
>>                 }
>>         ...
>>         ...
>>     }
>>
>>
>> In my case it fails because I'm not bound to any nis domain, so since mod->lookup_init() fails,
>> open_lookup() returns NSS_STATUS_NOTFOUND and we just return instead of moving on.
>>
>>
>>
>> Now, this doesn't work for ldap.
>> If we replace "nis" for "ldap" in /etc/nsswitch.conf, like:
>>
>> # grep auto /etc/nsswitch.conf
>> automount:    files ldap
>>
>>
>> And then we comment one mount and try to reload autofs with HUP signal, the process is going to be different.
>>
>> The point with ldap is that we don't really check if the configuration for ldap works or not, we just try to initialize it,
>> but we don't check if we can connect to the ldap server and do a bind operation, so even if our ldap is broken lookup_init()
>> from ldap is going to return success, letting us move on and call lookup->lookup_read_master(), which is going to fail hard
>> in the first call to do_reconnect() in lookup_read_master() function.
>>
>> The problem with this is that there is a big difference in failing in open_lookup() or failing in lookup->lookup_read_master(),
>> and that's it if we return NSS_STATUS_NOTFOUND or NSS_STATUS_UNAVAIL.
>> In case we return the last one, lookup_nss_read_master() is going to do:
>>
>>     if (result == NSS_STATUS_UNAVAIL)
>>         master->read_fail = 1;
>>
>> and this will not let autofs re-read its configuration and remove the mounts we commented.

I can't duplicate this with the current upstream source but I do think there
was a problem with HUP signal processing recently.

In any case, for the HUP signal processing master->read_fail = 1 shouldn't
prevent the mounted mounts update ....

       lookup_nss_read_master(master, age);
       ...

       if (!master->read_fail)
                master_mount_mounts(master, age, readall);
        else {
                master->read_fail = 0;
                /* HUP signal sets readall == 1 only */   <== see here.
                if (!readall)
                        return 0;
                else
                        master_mount_mounts(master, age, readall);
        }

>>
>>
>> Given that said, what I wanted to achieve with my patch is to really check if we got a working ldap setup
>> by trying to call do_reconnect() in do_init(), because it tries to connect to the ldap server found in the
>> config and do a bind operation, which should be enough to test if our ldap works or not.
>>
>> In my case, without the patch, if I try to reload the autofs config with the -HUP signal, autofs won't do it.
>> With the patch, since I return NSS_STATUS_NOTFOUND if lookup_init() fails, autofs is able to re-read the config.
>>
>> I hope I was able to explain it better this time.
> 
> Yes, this is what I was hoping for, thanks.
> 
>>
>> This being said, I'm not sure if we way I chose to fix this is the best one.
> 
> I need to check because there has been quite a bit of recent change in this area.
> 
> The unfortunate thing is lookup_init() probably should be returning
> NSS_STATUS_UNAVAIL for the overall semantics of how this is meant to work.
> 
> That it returns NSS_STATUS_NOTFOUND is not strictly right because lookup_init()
> can't know if the master map being looked for exists or not so an NSS_STATUS_UNAVAIL
> should allow it to continue on.

I think this return isn't right, it probably should be returning NSS_STATUS_UNAVAIL
but I'd rather not change that right now either.

> 
> What's worse is the semantics change a bit between startup and map re-read and a
> change to one case can break the other.
> 
> Let me have a look around.

Could you have a look at:
autofs-5.1.2-wait-for-master-map-available-at-start.patch
autofs-5.1.2-add-master-read-wait-option.patch
autofs-5.1.2-work-around-sss-startup-delay.patch
autofs-5.1.2-add-sss-master-map-wait-config-option.patch
autofs-5.1.2-fix-included-master-map-not-found-return.patch
autofs-5.1.2-dont-fail-on-master-map-read-fail-timeout.patch
autofs-5.1.2-set-sane-default-master-read-wait-timeout.patch
autofs-5.1.2-dont-return-until-after-master-map-retry-read.patch
autofs-5.1.2-make-lookup_nss_read_master-return-nss-status.patch

found at https://www.kernel.org/pub/linux/daemons/autofs/v5/patches-5.1.3/.

It's true they are mostly related to startup problems when the network is
not yet available and they have an annoying side effect of being unforgiving
of mis-configured nsswitch.conf (a file that has a broken configuration by
default and which autofs doesn't own and probably won't change to fix it)
but overall I think they are an improvement.

Do you have these in your test package?

I can't reproduce it with a daemon built with these and they also fixed a
lookup failure for nsswitch source ordering although it wasn't originally
intended to do so.

Ian
--
To unsubscribe from this list: send the line "unsubscribe autofs" in