On Wed, Apr 15, 2020 at 04:06:17PM -0400, Chuck Lever wrote: > > > > On Apr 15, 2020, at 3:25 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > > On Wed, Apr 15, 2020 at 01:05:11PM -0400, Chuck Lever wrote: > >> Hi Bruce and Jeff: > >> > >> Testing intensive workloads with NFSv3 and NFSv4.0 on NFS/RDMA with krb5i > >> or krb5p results in a pretty quick workload failure. Closer examination > >> shows that the client is able to overrun the GSS sequence window with > >> some regularity. When that happens, the server drops the connection. > >> > >> However, when the client retransmits requests with lost replies, they > >> never hit in the DRC, and that results in unexpected failures of non- > >> idempotent requests. > >> > >> The retransmitted XIDs are found in the DRC, but the retransmitted request > >> has a different checksum than the original. We're hitting the "mismatch" > >> case in nfsd_cache_key_cmp for these requests. > >> > >> I tracked down the problem to the way the DRC computes the length of the > >> part of the buffer it wants to checksum. nfsd_cache_csum uses > >> > >> head.iov_len + page_len > >> > >> and then caps that at RC_CSUMLEN. > >> > >> That works fine for krb5 and sys, but the GSS unwrap functions > >> (integ_unwrap_data and priv_unwrap_data) don't appear to update head.iov_len > >> properly. So nfsd_cache_csum's length computation is significantly larger > >> than the clear-text message, and that allows stale parts of the xdr_buf > >> to be included in the checksum. > >> > >> Using xdr_buf_subsegment() at the end of integ_unwrap_data sets the xdr_buf > >> lengths properly and fixes the situation for krb5i. > >> > >> I don't see a similar solution for priv_unwrap_data: there's no MIC len > >> available, and priv_len is not the actual length of the clear-text message. > >> > >> Moreover, the comment in fix_priv_head() is disturbing. I don't see anywhere > >> where the relationship between the buf's head/len and how svc_defer works is > >> authoritatively documented. It's not clear exactly how priv_unwrap_data is > >> supposed to accommodate svc_defer, or whether integ_unwrap_data also needs > >> to accommodate it. > >> > >> So I can't tell if the GSS unwrap functions are wrong or if there's a more > >> accurate way to compute the message length in nfsd_cache_csum. I suspect > >> both could use some improvement, but I'm not certain exactly what that > >> might be. > > > > I don't know, I tried looking through that code and didn't get any > > further than you. The gss unwrap code does look suspect to me. It > > needs some kind of proper design, as it stands it's just an accumulation > > of fixes. > > Having recently completed overhauling the client-side equivalents, I > agree with you there. > > I've now convinced myself that because nfsd_cache_csum might need to > advance into the first page of the Call message, rq_arg.head.iov_len > must contain an accurate length so that csum_partial is applied > correctly to the head buffer. > > Therefore it is the preceding code that needs to set up rq_arg.head.iov_len > correctly. The GSS unwrap functions have to do this, and therefore these > must be fixed. I would theorize that svc_defer also depends on head.iov_len > being set correctly. > > As far as how rq_arg.len needs to be set for svc_defer, I think I need > to have some kind of test case to examine how that path is triggered. Any > advice appreciated. It's triggered on upcalls, so for example if you flush the export caches with exports -f and then send an rpc with a filehandle, it should call svc_defer on that request. --b.