> On Apr 15, 2020, at 8:00 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Wed, Apr 15, 2020 at 06:23:57PM -0400, Chuck Lever wrote: >> >> >>> On Apr 15, 2020, at 5:58 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>> >>> On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote: >>>> >>>> >>>>> On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>>>> >>>>> On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: >>>>>> Hi Bruce and Jeff: >>>>>> >>>>>> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i >>>>>> or krb5p results in a pretty quick workload failure. Closer examination >>>>>> shows that the client is able to overrun the GSS sequence window with >>>>>> some regularity. When that happens, the server drops the connection. >>>>>> >>>>>> However, when the client retransmits requests with lost replies, they >>>>>> never hit in the DRC, and that results in unexpected failures of non- >>>>>> idempotent requests. >>>>>> >>>>>> The retransmitted XIDs are found in the DRC, but the retransmitted request >>>>>> has a different checksum than the original. We're hitting the "mismatch" >>>>>> case in nfsd_cache_key_cmp for these requests. >>>>>> >>>>>> I tracked down the problem to the way the DRC computes the length of the >>>>>> part of the buffer it wants to checksum. nfsd_cache_csum uses >>>>>> >>>>>> head.iov_len + page_len >>>>>> >>>>>> and then caps that at RC_CSUMLEN. >>>>>> >>>>>> That works fine for krb5 and sys, but the GSS unwrap functions >>>>>> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len >>>>>> properly. So nfsd_cache_csum's length computation is significantly larger >>>>>> than the clear-text message, and that allows stale parts of the xdr_buf >>>>>> to be included in the checksum. >>>>>> >>>>>> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf >>>>>> lengths properly and fixes the situation for krb5i. >>>>>> >>>>>> I don't see a similar solution for priv_unwrap_data: there's no MIC len >>>>>> available, and priv_len is not the actual length of the clear-text message. >>>>>> >>>>>> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere >>>>>> where the relationship between the buf's head/len and how svc_defer works is >>>>>> authoritatively documented. It's not clear exactly how priv_unwrap_data is >>>>>> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs >>>>>> to accommodate it. >>>>>> >>>>>> So I can't tell if the GSS unwrap functions are wrong or if there's a more >>>>>> accurate way to compute the message length in nfsd_cache_csum. I suspect >>>>>> both could use some improvement, but I'm not certain exactly what that >>>>>> might be. >>>>> >>>>> I don't know, I tried looking through that code and didn't get any >>>>> further than you. The gss unwrap code does look suspect to me. It >>>>> needs some kind of proper design, as it stands it's just an accumulation >>>>> of fixes. >>>> >>>> Having recently completed overhauling the client-side equivalents, I >>>> agree with you there. >>>> >>>> I've now convinced myself that because nfsd_cache_csum might need to >>>> advance into the first page of the Call message, rq_arg.head.iov_len >>>> must contain an accurate length so that csum_partial is applied >>>> correctly to the head buffer. >>>> >>>> Therefore it is the preceding code that needs to set up rq_arg.head.iov_len >>>> correctly. The GSS unwrap functions have to do this, and therefore these >>>> must be fixed. I would theorize that svc_defer also depends on head.iov_len >>>> being set correctly. >>>> >>>> As far as how rq_arg.len needs to be set for svc_defer, I think I need >>>> to have some kind of test case to examine how that path is triggered. Any >>>> advice appreciated. >>> >>> It's triggered on upcalls, so for example if you flush the export caches >>> with exports -f and then send an rpc with a filehandle, it should call >>> svc_defer on that request. >> >> /me puts a brown paper bag over his head >> >> Reverting 241b1f419f0e ("SUNRPC: Remove xdr_buf_trim()") seems to fix both >> krb5i and krb5p. > > Well, it has my ack too.... > >> I'll post an official patch once I've done a little more testing. I promise >> to get the Fixes: tag right :-) > > I did have it in the back of my mind that one of us had fixed a similar > bug before. Indeed, Jeff's: > > 4c190e2f913f "sunrpc: trim off trailing checksum before > returning decrypted or integrity authenticated buffer" > > explains exactly the bug you saw. Yep, I agree with the purpose of 4c190e2f913f. The operation of xdr_bufs is different on the client and server in many subtle ways, which I probably still don't completely understand. The piece I missed was that the server ignores buf->len in many cases, preferring instead to use the xdr_buf segment lengths. So simply updating buf->len is insufficient on the server side. > Maybe some of that changelog should move into a code comment instead. The patch I'm testing is a mechanical revert. It doesn't improve the comments, and restores a BUG_ON that 241b1f419f0e removed. I'm putting off clean-ups for the moment because I'd like to see this fix in 5.7-rc. > And I still think the code is more accidents waiting to happen. The bigger picture is updating the server to use xdr_stream throughout its XDR stack. That's the main reason I worked on the client-side GSS wrap and unwrap functions last year. Using xdr_stream would move the server and client sides closer together in style and implementation, then hopefully we could share more code. -- Chuck Lever