Re: GSS unwrapping breaks the DRC

Chuck Lever <chuck.lever@xxxxxxxxxx> · Fri, 17 Apr 2020 17:48:35 -0400

> On Apr 15, 2020, at 6:23 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> 
> 
> 
>> On Apr 15, 2020, at 5:58 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>> 
>> On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote:
>>> 
>>> 
>>>> On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>>>> 
>>>> On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote:
>>>>> Hi Bruce and Jeff:
>>>>> 
>>>>> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i
>>>>> or krb5p results in a pretty quick workload failure. Closer examination
>>>>> shows that the client is able to overrun the GSS sequence window with
>>>>> some regularity. When that happens, the server drops the connection.
>>>>> 
>>>>> However, when the client retransmits requests with lost replies, they
>>>>> never hit in the DRC, and that results in unexpected failures of non-
>>>>> idempotent requests.
>>>>> 
>>>>> The retransmitted XIDs are found in the DRC, but the retransmitted request
>>>>> has a different checksum than the original. We're hitting the "mismatch"
>>>>> case in nfsd_cache_key_cmp for these requests.
>>>>> 
>>>>> I tracked down the problem to the way the DRC computes the length of the
>>>>> part of the buffer it wants to checksum. nfsd_cache_csum uses
>>>>> 
>>>>> head.iov_len + page_len
>>>>> 
>>>>> and then caps that at RC_CSUMLEN.
>>>>> 
>>>>> That works fine for krb5 and sys, but the GSS unwrap functions
>>>>> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len
>>>>> properly. So nfsd_cache_csum's length computation is significantly larger
>>>>> than the clear-text message, and that allows stale parts of the xdr_buf
>>>>> to be included in the checksum.
>>>>> 
>>>>> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf
>>>>> lengths properly and fixes the situation for krb5i.
>>>>> 
>>>>> I don't see a similar solution for priv_unwrap_data: there's no MIC len
>>>>> available, and priv_len is not the actual length of the clear-text message.
>>>>> 
>>>>> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere
>>>>> where the relationship between the buf's head/len and how svc_defer works is
>>>>> authoritatively documented. It's not clear exactly how priv_unwrap_data is
>>>>> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs
>>>>> to accommodate it.
>>>>> 
>>>>> So I can't tell if the GSS unwrap functions are wrong or if there's a more
>>>>> accurate way to compute the message length in nfsd_cache_csum. I suspect
>>>>> both could use some improvement, but I'm not certain exactly what that
>>>>> might be.
>>>> 
>>>> I don't know, I tried looking through that code and didn't get any
>>>> further than you.  The gss unwrap code does look suspect to me.  It
>>>> needs some kind of proper design, as it stands it's just an accumulation
>>>> of fixes.
>>> 
>>> Having recently completed overhauling the client-side equivalents, I
>>> agree with you there.
>>> 
>>> I've now convinced myself that because nfsd_cache_csum might need to
>>> advance into the first page of the Call message, rq_arg.head.iov_len
>>> must contain an accurate length so that csum_partial is applied
>>> correctly to the head buffer.
>>> 
>>> Therefore it is the preceding code that needs to set up rq_arg.head.iov_len
>>> correctly. The GSS unwrap functions have to do this, and therefore these
>>> must be fixed. I would theorize that svc_defer also depends on head.iov_len
>>> being set correctly.
>>> 
>>> As far as how rq_arg.len needs to be set for svc_defer, I think I need
>>> to have some kind of test case to examine how that path is triggered. Any
>>> advice appreciated.
>> 
>> It's triggered on upcalls, so for example if you flush the export caches
>> with exports -f and then send an rpc with a filehandle, it should call
>> svc_defer on that request.
> 
> /me puts a brown paper bag over his head
> 
> Reverting 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()") seems to fix both
> krb5i and krb5p.
> 
> I'll post an official patch once I've done a little more testing. I promise
> to get the Fixes: tag right :-)

I've hit a snag here.

I reverted 241b1f419f0e on my server, and all tests completed
successfully.

I reverted 241b1f419f0e on my client, and now krb5p is failing. Using
xdr_buf_trim does the right thing on the server, but on the client it
has exposed a latent bug in gss_unwrap_resp_priv() (ie, the bug does
not appear to be harmful until 241b1f419f0e has been reverted).

The calculation of au_ralign in that function is wrong, and that forces
rpc_prepare_reply_pages to allocate a zero-length tail. xdr_buf_trim()
then lops off the end of each subsequent clear-text RPC message, and
eventually a short READ results in test failures.

After experimenting with this for a day, I don't see any way for
gss_unwrap_resp_priv() to estimate au_ralign based on what gss_unwrap()
has done to the xdr_buf. The kerberos_v1 unwrap method does not appear
to have any trailing checksum, for example, but v2 does.

The best approach for now seems to be to have the pseudoflavor-specific
unwrap methods return the correct ralign value. A straightforward way
to do this would be to add a *int parameter to ->gss_unwrap that would
be set to the proper value; or hide that value somewhere in the xdr_buf.

Any other thoughts or random bits of inspiration?

--
Chuck Lever