nfs-idmapd startup race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I've been debugging an nfs server issue where id mapping was not happening correctly unless I restarted nfs-kernel-server and re-exported shares shortly after reboot. The main symptom is the following log entries from nfs-idmapd.service:

Mar 08 22:45:59 343guiltyspark.nub.lan systemd[1]: Starting NFSv4 ID-name mapping service...
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: libnfsidmap: Unable to determine the NFSv4 domain; Using 'localdomain' as the NFSv4 domain which means UIDs will be mapped to the 'Nobody-User' user defined in /etc/idmapd.conf
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: Unable to determine the NFSv4 domain; Using 'localdomain' as the NFSv4 domain which means UIDs will be mapped to the 'Nobody-User' user defined in /etc/idmapd.conf
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: using (default) domain: localdomain
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: Realms list: 'LOCALDOMAIN'
Mar 08 22:45:59 343guiltyspark.nub.lan rpc.idmapd[620]: rpc.idmapd: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch

I wrote a little test program to mimic libnfsidmap's domain_from_dns() function, which causes the above message:

#include <netdb.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
extern int h_errno;
int main() {
    struct hostent *he;
    char hname[64], *c;

    if (gethostname(hname, sizeof(hname)))
        printf("gethostname error: %d\n", errno);
    else
        printf("gethostname: '%s'\n", hname);

    if ((he = gethostbyname(hname)) == NULL)
        printf("gethostbyname error: '%s'\n", hstrerror(h_errno));
    else {
        printf("gethostbyname h_name: '%s'\n", he->h_name);
    }
}

and added it as an ExecStartPre= to the systemd service. The output is:

gethostname: '343guiltyspark.nub.lan'
gethostbyname error: 'Host name lookup failure'

It seems dns resolution isn't quite working when the service is started, so I added Wants=network-online.target (and After=) to the systemd service. It still fails. But if I then add a "sleep 1" to the ExecStartPre, everything starts up correctly.

Obviously there are many solutions, including the above and setting the domain manually in /etc/idmap.conf. But on principle I'd like to solve the root race condition and help others avoid the same issue.

I'm hoping someone can answer my open questions:

1. Why does libnfsidmap use gethostname() and gethostbyname() (i.e. why does it need a dns lookup on the hostname)?

2. nfs-server.service already has a dependency on network-online.target, but nfs-idmapd.service does not (and it starts first). Since id mapping can depend on DNS resolution (and seems to out of the box), why not add the dependency to the latter as well?

3. Since the network-online.target doesn't completely solve the issue, any ideas on how to fix the startup race without something haphazard like a "sleep"?

Thanks,

Aram




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux