On Fri, Mar 28, 2014 at 07:11:56PM -0500, Tom Tucker wrote: > Hi Bruce, > > On 3/28/14 4:26 PM, J. Bruce Fields wrote: > >On Fri, Mar 28, 2014 at 10:21:27AM -0500, Tom Tucker wrote: > >>Hi Bruce, > >> > >>On 3/27/14 9:08 PM, J. Bruce Fields wrote: > >>>On Tue, Mar 25, 2014 at 03:14:57PM -0500, Steve Wise wrote: > >>>>From: Tom Tucker <tom@xxxxxx> > >>>> > >>>>The server regression was caused by the addition of rq_next_page > >>>>(afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that > >>>>were missed with the update of the rq_respages array. > >>>Apologies. (But, it could happen again--could we set up some regular > >>>testing? It doesn't have to be anything fancy, just cthon over > >>>rdma--really, just read and write over rdma--would probably catch a > >>>lot.) > >>I think Chelsio is going to be adding some NFSRDMA regression > >>testing to their system test. > >OK. Do you know who there is setting that up? I'd be curious exactly > >what kernels they intend to test and how they plan to report results. > > > > I don't know, Steve can weigh in on this... > > >>>Also: I don't get why all these rq_next_page initializations are > >>>required. Why isn't the initialization at the top of svc_process() > >>>enough? Is rdma using it before we get to that point? The only use of > >>>it I see off hand is in the while loop that you're deleting. > >>I didn't apply tremendous deductive powers here, I just added > >>updates to rq_next_page wherever the transport messed with > >>rq_respages. That said, NFS WRITE is likely the culprit since the > >>write is completed as a deferral and therefore the request doesn't > >>go through svc_process, so if rq_next_page is bogus, the cleanup > >>will free/re-use pages that are actually in use by the transport. > >Ugh, OK, without tracing through the code I guess I can see how that > >would happen. Remind me why it's using deferrals? > > The server fetches the write data from the client using RDMA READ. > So the request says ... "here's where the data is in my memory", and > then the server issues an RDMA READ to fetch it. When the read > completes, the deferred request is completed. That makes sense, but maybe I'm not sure what you mean by deferring. The tcp code can also receive a request over multiple recvfroms. See Trond's hack in 31d68ef65c7d4 "SUNRPC: Don't wait for full record to receive tcp data". --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html