Clarification: > On Sep 19, 2022, at 11:31 AM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > Hi- > > I rediscovered recently that NFSv4.0 with Kerberos does not work on > multi-homed hosts. This is true even with sec=sys because the client > attempts to establish a GSS context when there is a keytab present. > > Basically my test environment has to work for sec=sys and sec=krb* > and for all NFS versions and minor versions. Thus I keep a keytab > on it. > > Now, I have three network interfaces on my client: one RoCE, one > IB, and one GbE. They are each on their own subnet and each has > a unique hostname (that varies in the domain part). > > But mounting one of my IB or RoCE test servers with NFSv4.0 results > in the infamous "NFSv4: Invalid callback credential" message. The > client always uses the principal for GbE interface... ... for the forward channel, but it expects the backchannel principal to be the acceptor that the server saw on the forward channel. Currently, when a Linux client mounts server.ib.example.net: - the client uses the acceptor host@xxxxxxxxxxxxxxxxxx (if the keytab happens to have a host@ principal) - the authenticates to the principal nfs@xxxxxxxxxxxxxxxxxxxxx - the client expects to see the server authenticate to nfs@xxxxxxxxxxxxxxxxxxxxx as the principal on the backchannel, but gets host@xxxxxxxxxxxxxxxxxx instead, and check_gss_callback_principal() fails IIUC, the NFS protocol expects: - the client uses the acceptor nfs@xxxxxxxxxxxxxxxxxxxxx - the server uses the principal nfs@xxxxxxxxxxxxxxxxxxxxx - the client should see nfs@xxxxxxxxxxxxxxxxxxxxx as the principal on the backchannel > This was working at one point, but seems to have devolved over time. > > > Here are some of the problems I found: > > 1. The kernel always asks for service=* . > > If your system's keytab has only "nfs" service principals in it, > that should be OK. If it has a "host" principal in it, that's > going to be the first one that gssd picks up. > > NFSv4.0 callback does not work with a host@ acceptor -- it wants > nfs@. > > There are two possible workarounds: > > a. Remove all but the nfs@ keys from your system's keytab. > > b. Modify the kernel to use "service=nfs" in the upcall. > > I favor b. The NFS specifications do not appear to require it, > but they suggest that an "nfs@" principal is always to be used > for protecting NFS with GSS. And: the NFS callback channel is an NFS service that needs to use an nfs@ service principal. So when the server attempts to authenticate to the client's callback service, it always needs to use nfs@. > But more importantly, other subsystems share the keytab with > NFS. They might want a root@ or host@ key in there too, and > that will break NFSv4.0. > > > 2. nsswitch.conf::hosts now has a "myhostname" service, and it's > placed before the "resolve" service by default. > > I enabled systemd-resolved on my systems, to be part of the future. > Yeah, I know, right? > > Now, a DNS query for the hostname associated with any of my system's > IP addresses (and there are several) always resolves to the One True > hostname. So gssd always gets the wrong principal when mounting via > alternate network interfaces. > > Moving "myhostname" after "resolve" seems to address this issue, but > I'm told that this will be reverted if I reconfigure the resolver or > update the system? > > The bugs I found that document this issue keep getting closed because > they target a specific Fedora version which always gets EOL'd after > a year. > > > 3. gssproxy gets the acceptor name wrong. > > It has the same problem as in 2, even with the nsswitch.conf > workaround in place. So gssproxy returns the same principal for every > network interface on the system, and that breaks NFSv4.0 callback. > > Note also that adding "use-gss-proxy=0" to /etc/nfs.conf does not > appear to disable gssproxy. I had to boot up and then "sudo systemctl > stop gssproxy" and even then, the kernel still tries to make upcalls > to it. > > I noticed that setting the gssd debugging options in /etc/nfs.conf > also has no effect. I had to edit the gssd service files to get > debugging information > > I'm not sure how to fix this one -- I'd like to see gssproxy > fixed to deal with this correctly, but also whatever reads > /etc/nfs.conf needs to get fixed so that the gssd settings in > that file are observed. > > > Any opinions or guidance appreciated, especially from maintainers > (like, aw hell naw, or yep that's broken, send a patch). Another possibility would be to make check_gss_callback_principal() more flexible. -- Chuck Lever