NFSv4.0 callback with Kerberos not working

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi-

I rediscovered recently that NFSv4.0 with Kerberos does not work on
multi-homed hosts. This is true even with sec=sys because the client
attempts to establish a GSS context when there is a keytab present.

Basically my test environment has to work for sec=sys and sec=krb*
and for all NFS versions and minor versions. Thus I keep a keytab
on it.

Now, I have three network interfaces on my client: one RoCE, one
IB, and one GbE. They are each on their own subnet and each has
a unique hostname (that varies in the domain part).

But mounting one of my IB or RoCE test servers with NFSv4.0 results
in the infamous "NFSv4: Invalid callback credential" message. The
client always uses the principal for GbE interface.

This was working at one point, but seems to have devolved over time.


Here are some of the problems I found:

1. The kernel always asks for service=* .

If your system's keytab has only "nfs" service principals in it,
that should be OK. If it has a "host" principal in it, that's
going to be the first one that gssd picks up.

NFSv4.0 callback does not work with a host@ acceptor -- it wants
nfs@.

There are two possible workarounds:

a. Remove all but the nfs@ keys from your system's keytab.

b. Modify the kernel to use "service=nfs" in the upcall.

I favor b. The NFS specifications do not appear to require it,
but they suggest that an "nfs@" principal is always to be used
for protecting NFS with GSS.

But more importantly, other subsystems share the keytab with
NFS. They might want a root@ or host@ key in there too, and
that will break NFSv4.0.


2. nsswitch.conf::hosts now has a "myhostname" service, and it's
placed before the "resolve" service by default.

I enabled systemd-resolved on my systems, to be part of the future.
Yeah, I know, right?

Now, a DNS query for the hostname associated with any of my system's
IP addresses (and there are several) always resolves to the One True
hostname. So gssd always gets the wrong principal when mounting via
alternate network interfaces.

Moving "myhostname" after "resolve" seems to address this issue, but
I'm told that this will be reverted if I reconfigure the resolver or
update the system?

The bugs I found that document this issue keep getting closed because
they target a specific Fedora version which always gets EOL'd after
a year.


3. gssproxy gets the acceptor name wrong.

It has the same problem as in 2, even with the nsswitch.conf
workaround in place. So gssproxy returns the same principal for every
network interface on the system, and that breaks NFSv4.0 callback.

Note also that adding "use-gss-proxy=0" to /etc/nfs.conf does not
appear to disable gssproxy. I had to boot up and then "sudo systemctl
stop gssproxy" and even then, the kernel still tries to make upcalls
to it.

I noticed that setting the gssd debugging options in /etc/nfs.conf
also has no effect. I had to edit the gssd service files to get
debugging information

I'm not sure how to fix this one -- I'd like to see gssproxy
fixed to deal with this correctly, but also whatever reads
/etc/nfs.conf needs to get fixed so that the gssd settings in
that file are observed.


Any opinions or guidance appreciated, especially from maintainers
(like, aw hell naw, or yep that's broken, send a patch).


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux