Re: GSS unwrapping breaks the DRC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> 
> On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote:
>> Hi Bruce and Jeff:
>> 
>> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i
>> or krb5p results in a pretty quick workload failure. Closer examination
>> shows that the client is able to overrun the GSS sequence window with
>> some regularity. When that happens, the server drops the connection.
>> 
>> However, when the client retransmits requests with lost replies, they
>> never hit in the DRC, and that results in unexpected failures of non-
>> idempotent requests.
>> 
>> The retransmitted XIDs are found in the DRC, but the retransmitted request
>> has a different checksum than the original. We're hitting the "mismatch"
>> case in nfsd_cache_key_cmp for these requests.
>> 
>> I tracked down the problem to the way the DRC computes the length of the
>> part of the buffer it wants to checksum. nfsd_cache_csum uses
>> 
>>  head.iov_len + page_len
>> 
>> and then caps that at RC_CSUMLEN.
>> 
>> That works fine for krb5 and sys, but the GSS unwrap functions
>> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len
>> properly. So nfsd_cache_csum's length computation is significantly larger
>> than the clear-text message, and that allows stale parts of the xdr_buf
>> to be included in the checksum.
>> 
>> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf
>> lengths properly and fixes the situation for krb5i.
>> 
>> I don't see a similar solution for priv_unwrap_data: there's no MIC len
>> available, and priv_len is not the actual length of the clear-text message.
>> 
>> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere
>> where the relationship between the buf's head/len and how svc_defer works is
>> authoritatively documented. It's not clear exactly how priv_unwrap_data is
>> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs
>> to accommodate it.
>> 
>> So I can't tell if the GSS unwrap functions are wrong or if there's a more
>> accurate way to compute the message length in nfsd_cache_csum. I suspect
>> both could use some improvement, but I'm not certain exactly what that
>> might be.
> 
> I don't know, I tried looking through that code and didn't get any
> further than you.  The gss unwrap code does look suspect to me.  It
> needs some kind of proper design, as it stands it's just an accumulation
> of fixes.

Having recently completed overhauling the client-side equivalents, I
agree with you there.

I've now convinced myself that because nfsd_cache_csum might need to
advance into the first page of the Call message, rq_arg.head.iov_len
must contain an accurate length so that csum_partial is applied
correctly to the head buffer.

Therefore it is the preceding code that needs to set up rq_arg.head.iov_len
correctly. The GSS unwrap functions have to do this, and therefore these
must be fixed. I would theorize that svc_defer also depends on head.iov_len
being set correctly.

As far as how rq_arg.len needs to be set for svc_defer, I think I need
to have some kind of test case to examine how that path is triggered. Any
advice appreciated.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux