Re: SETCLIENTID acceptor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On May 14, 2018, at 1:26 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> On Fri, May 11, 2018 at 4:57 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> 
>>> On May 10, 2018, at 4:58 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>> 
>>> On Thu, May 10, 2018 at 3:23 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>>> On May 10, 2018, at 3:07 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>> 
>>>>> On Thu, May 10, 2018 at 2:09 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On May 10, 2018, at 1:40 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> On Wed, May 9, 2018 at 5:19 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>>>>>>> I'm right on the edge of my understanding of how this all works.
>>>>>>>> 
>>>>>>>> I've re-keyed my NFS server. Now on my client, I'm seeing this on
>>>>>>>> vers=4.0,sec=sys mounts:
>>>>>>>> 
>>>>>>>> May  8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>>> May  8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>>> May  8 16:40:30 manet kernel: NFS: NFSv4 callback contains invalid cred
>>>>>>>> 
>>>>>>>> manet is my client, and klimt is my server. I'm mounting with
>>>>>>>> NFS/RDMA, so I'm mounting hostname klimt.ib, not klimt.
>>>>>>>> 
>>>>>>>> Because the client is using krb5i for lease management, the server
>>>>>>>> is required to use krb5i for the callback channel (S 3.3.3 of RFC
>>>>>>>> 7530).
>>>>>>>> 
>>>>>>>> After a SETCLIENTID, the client copies the acceptor from the GSS
>>>>>>>> context it set up, and uses that to check incoming callback
>>>>>>>> requests. I instrumented the client's SETCLIENTID proc, and I see
>>>>>>>> this:
>>>>>>>> 
>>>>>>>> check_gss_callback_principal: acceptor=nfs@xxxxxxxxxxxxxxxxxxxxxxxx, principal=host@xxxxxxxxxxxxxxxxxxxxx
>>>>>>>> 
>>>>>>>> The principal strings are not equal, and that's why the client
>>>>>>>> believes the callback credential is bogus. Now I'm trying to
>>>>>>>> figure out whether it is the server's callback client or the
>>>>>>>> client's callback server that is misbehaving.
>>>>>>>> 
>>>>>>>> To me, the server's callback principal (host@klimt) seems like it
>>>>>>>> is correct. The client would identify as host@manet when making
>>>>>>>> calls to the server, for example, so I'd expect the server to
>>>>>>>> behave similarly when performing callbacks.
>>>>>>>> 
>>>>>>>> Can anyone shed more light on this?
>>>>>>> 
>>>>>>> What are your full hostnames of each machine and does the reverse
>>>>>>> lookup from the ip to hostname on each machine give you what you
>>>>>>> expect?
>>>>>>> 
>>>>>>> Sounds like all of them need to be resolved to <>.ib.1015grager.net
>>>>>>> but somewhere you are getting <>.1015grager.net instead.
>>>>>> 
>>>>>> The forward and reverse mappings are consistent, and rdns is
>>>>>> disabled in my krb5.conf files. My server is multi-homed; it
>>>>>> has a 1GbE interface (klimt.1015granger.net); an FDR IB
>>>>>> interface (klimt.ib.1015granger.net); and a 25 GbE interface
>>>>>> (klimt.roce.1015granger.net).
>>>>> 
>>>>> Ah, so you are keeping it very interesting...
>>>>> 
>>>>>> My theory is that the server needs to use the same principal
>>>>>> for callback operations that the client used for lease
>>>>>> establishment. The last paragraph of S3.3.3 seems to state
>>>>>> that requirement, though it's not especially clear; and the
>>>>>> client has required it since commit f11b2a1cfbf5 (2014).
>>>>>> 
>>>>>> So the server should authenticate as nfs@xxxxxxxx and not
>>>>>> host@klimt, in this case, when performing callback requests.
>>>>> 
>>>>> Yes I agree that server should have authenticated as nfs@xxxxxxxx and
>>>>> that's what I see in my (simple) single home setup.
>>>>> 
>>>>> In nfs-utils there is code that deals with the callback and comment
>>>>> about choices for the principal:
>>>>>       * Restricting gssd to use "nfs" service name is needed for when
>>>>>       * the NFS server is doing a callback to the NFS client.  In this
>>>>>       * case, the NFS server has to authenticate itself as "nfs" --
>>>>>       * even if there are other service keys such as "host" or "root"
>>>>>       * in the keytab.
>>>>> So the upcall for the callback should have specifically specified
>>>>> "nfs" to look for the nfs/<hostname>. Question is if you key tab has
>>>>> both:
>>>>> nfs/klmit and nfs/klmit.ib how does it choose which one to take. I'm
>>>>> not sure. But I guess in your case you are seeing that it choose
>>>>> "host/<>" which would really be a nfs-utils bug.
>>>> 
>>>> I think the upcall is correctly requesting an nfs/ principal
>>>> (see below).
>>>> 
>>>> Not only does it need to choose an nfs/ principal, but it also
>>>> has to pick the correct domain name. The domain name does not
>>>> seem to be passed up to gssd. fs/nfsd/nfs4state.c has this:
>>>> 
>>>> 749 static struct rpc_cred *callback_cred;
>>>> 750
>>>> 751 int set_callback_cred(void)
>>>> 752 {
>>>> 753         if (callback_cred)
>>>> 754                 return 0;
>>>> 755         callback_cred = rpc_lookup_machine_cred("nfs");
>>>> 756         if (!callback_cred)
>>>> 757                 return -ENOMEM;
>>>> 758         return 0;
>>>> 759 }
>>>> 760
>>>> 761 void cleanup_callback_cred(void)
>>>> 762 {
>>>> 763         if (callback_cred) {
>>>> 764                 put_rpccred(callback_cred);
>>>> 765                 callback_cred = NULL;
>>>> 766         }
>>>> 767 }
>>>> 768
>>>> 769 static struct rpc_cred *get_backchannel_cred(struct nfs4_client *clp, struct rpc_clnt *client, struct nfsd4_session *ses)
>>>> 770 {
>>>> 771         if (clp->cl_minorversion == 0) {
>>>> 772                 return get_rpccred(callback_cred);
>>>> 773         } else {
>>>> 774                 struct rpc_auth *auth = client->cl_auth;
>>>> 775                 struct auth_cred acred = {};
>>>> 776
>>>> 777                 acred.uid = ses->se_cb_sec.uid;
>>>> 778                 acred.gid = ses->se_cb_sec.gid;
>>>> 779                 return auth->au_ops->lookup_cred(client->cl_auth, &acred, 0);
>>>> 780         }
>>>> 781 }
>>>> 
>>>> rpc_lookup_machine_cred("nfs"); should request an "nfs/" service
>>>> principal, shouldn't it?
>>>> 
>>>> Though I think this approach is incorrect. The server should not
>>>> use the machine cred here, it should use a credential based on
>>>> the principal the client used to establish it's lease.
>>>> 
>>>> 
>>>>> What's in your server's key tab?
>>>> 
>>>> [root@klimt ~]# klist -ke /etc/krb5.keytab
>>>> Keytab name: FILE:/etc/krb5.keytab
>>>> KVNO Principal
>>>> ---- --------------------------------------------------------------------------
>>>>  4 host/klimt.1015granger.net@xxxxxxxxxxxxxxx (aes256-cts-hmac-sha1-96)
>>>>  4 host/klimt.1015granger.net@xxxxxxxxxxxxxxx (aes128-cts-hmac-sha1-96)
>>>>  4 host/klimt.1015granger.net@xxxxxxxxxxxxxxx (des3-cbc-sha1)
>>>>  4 host/klimt.1015granger.net@xxxxxxxxxxxxxxx (arcfour-hmac)
>>>>  3 nfs/klimt.1015granger.net@xxxxxxxxxxxxxxx (aes256-cts-hmac-sha1-96)
>>>>  3 nfs/klimt.1015granger.net@xxxxxxxxxxxxxxx (aes128-cts-hmac-sha1-96)
>>>>  3 nfs/klimt.1015granger.net@xxxxxxxxxxxxxxx (des3-cbc-sha1)
>>>>  3 nfs/klimt.1015granger.net@xxxxxxxxxxxxxxx (arcfour-hmac)
>>>>  3 nfs/klimt.ib.1015granger.net@xxxxxxxxxxxxxxx (aes256-cts-hmac-sha1-96)
>>>>  3 nfs/klimt.ib.1015granger.net@xxxxxxxxxxxxxxx (aes128-cts-hmac-sha1-96)
>>>>  3 nfs/klimt.ib.1015granger.net@xxxxxxxxxxxxxxx (des3-cbc-sha1)
>>>>  3 nfs/klimt.ib.1015granger.net@xxxxxxxxxxxxxxx (arcfour-hmac)
>>>>  3 nfs/klimt.roce.1015granger.net@xxxxxxxxxxxxxxx (aes256-cts-hmac-sha1-96)
>>>>  3 nfs/klimt.roce.1015granger.net@xxxxxxxxxxxxxxx (aes128-cts-hmac-sha1-96)
>>>>  3 nfs/klimt.roce.1015granger.net@xxxxxxxxxxxxxxx (des3-cbc-sha1)
>>>>  3 nfs/klimt.roce.1015granger.net@xxxxxxxxxxxxxxx (arcfour-hmac)
>>>> [root@klimt ~]#
>>>> 
>>>> As a workaround, I bet moving the keys for nfs/klimt.ib to
>>>> the front of the keytab file would allow Kerberos to work
>>>> with the klimt.ib interface.
>>>> 
>>>> 
>>>>> An output from gssd -vvv would be interesting.
>>>> 
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: #012handle_gssd_upcall: 'mech=krb5 uid=0 target=host@xxxxxxxxxxxxxxxxxxxxx service=nfs enctypes=18,17,16,2
>>>> 3,3,1,2 ' (nfsd4_cb/clnt0)
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: krb5_use_machine_creds: uid 0 tgtname host@xxxxxxxxxxxxxxxxxxxxx
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'manet.1015granger.net' is 'manet.1015granger.net'
>>>> May 10 14:43:24 klimt rpc.gssd[1191]: Full hostname for 'klimt.1015granger.net' is 'klimt.1015granger.net'
>>> 
>>> I think that's the problem. This should have been
>>> klimt.ib.1015granger.net. nfs-utils just calls gethostname() to get
>>> the local domain name. And this is what it'll match against the key
>>> tab entry. So I think even if you move the key tabs around it probably
>>> will still pick nfs@xxxxxxxxxxxxxxxxxxxxx.
>> 
>> mount.nfs has a helper function called nfs_ca_sockname() that does a
>> connect/getsockname dance to derive the local host's hostname as it
>> is seen by the other end of the connection. So in this case, the
>> server's gssd would get the client's name, "manet.ib.1015granger.net"
>> and the "nfs" service name, and would correctly derive the service
>> principal "nfs/klimt.ib.1015granger.net" based on that.
>> 
>> Would it work if gssd did this instead of using gethostname(3) ? Then
>> the kernel wouldn't have to pass the correct principal up to gssd, it
>> would be able to derive it by itself.
> 
> I'd need to remind myself of how all of this work because I could
> confidently answer this. We are currently passing "target=" from the
> kernel as well as doing gethostbyname() in the gssd. Why? I don't know
> and need to figure out what each piece really accomplishes.
> 
> I would think if the kernel could provide us with the correct domain
> name (as it knows over which interface the request came in), then gssd
> should just be using that instead querying the domain on its own.

I didn't see a target field, but I didn't look that closely.

The credential created by the kernel for this purpose does
not appear to provide more than "nfs" as the service
principal. Changing gssd as I describe above seems to help
the situation (on the server at least; I don't know what it
would do to the client).

It looks like the same cred is used for all NFSv4.0 callback
channels. That at least will need a code change to make
multi-homing work properly with Kerberos.

I'm not claiming that I have a long term solution here. I'm
just reporting my experimental results :-)


> Btw, what happened after your turned off the gssproxy? Did you get
> further in getting the "nfs" and not "host" identity used?

I erased the gssproxy cache, and that appears to have fixed
the client misbehavior. I'm still using gssproxy, and I was
able to use NFSv4.0 with Kerberos on my TCP-only i/f, then
on my IB i/f, then on my RoCE i/f without notable problems.

Since gssproxy is the default configuration on RHEL 7-based
systems, I think we want to make gssproxy work rather than
disabling it -- unless there is some serious structural 
problem that will prevent it from ever working right.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux