On Thu, 2022-03-31 at 18:53 +0000, Chuck Lever III wrote: > > > On Mar 31, 2022, at 12:24 PM, Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Thu, 2022-03-31 at 16:20 +0000, Chuck Lever III wrote: > > > > > > > On Mar 31, 2022, at 12:15 PM, Trond Myklebust > > > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Hmm... Here's another thought. What if this were a deferred > > > > request > > > > that is being replayed after an upcall to mountd or the > > > > idmapper? > > > > It > > > > would mean that the synchronous wait in cache_defer_req() > > > > failed, > > > > so it > > > > is going to be rare, but it could happen on a congested system. > > > > > > > > AFAICS, svc_defer() does _not_ save rqstp->rq_xprt_ctxt, so > > > > svc_deferred_recv() won't restore it either. > > > > > > True, but TCP and UDP both use rq_xprt_ctxt, so wouldn't we have > > > seen this problem before on a socket transport? > > > > TCP does not set rq_xprt_ctxt, and nobody really uses UDP these > > days. > > > > > I need to audit code to see if saving rq_xprt_ctxt in svc_defer() > > > is safe and reasonable to do. Maybe Bruce has a thought. > > > > It should be safe for the UDP case, AFAICS. I have no opinion as of > > yet > > about how safe it is to do with RDMA. > > It's plausible that a deferred request could be replayed, but I don't > understand the deferral mechanism enough to know whether the rctxt > would be released before the deferred request could be handled. It > doesn't look like it would, but I could misunderstand something. > > There's a longstanding testing gap here: None of my test workloads > appear to force a request deferral. I don't recall Bruce having > such a test either. > > It would be nice if we had something that could force the use of the > deferral path, like a command line option for mountd that would cause > it to sleep for several seconds before responding to an upcall. It > might also be done with the kernel's fault injector. > You just need some mechanism that causes svc_get_next_xprt() to set rqstp->rq_chandle.thread_wait to the value '0'. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx