Re: Question about RDMA server...

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 4 Apr 2022 16:19:05 +0000

On Mon, 2022-04-04 at 15:34 +0000, Chuck Lever III wrote:
> 
> 
> > On Mar 31, 2022, at 4:08 PM, Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Thu, 2022-03-31 at 18:53 +0000, Chuck Lever III wrote:
> > > 
> > > > On Mar 31, 2022, at 12:24 PM, Trond Myklebust
> > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > On Thu, 2022-03-31 at 16:20 +0000, Chuck Lever III wrote:
> > > > > 
> > > > > > On Mar 31, 2022, at 12:15 PM, Trond Myklebust
> > > > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > 
> > > > > > Hmm... Here's another thought. What if this were a deferred
> > > > > > request
> > > > > > that is being replayed after an upcall to mountd or the
> > > > > > idmapper?
> > > > > > It
> > > > > > would mean that the synchronous wait in cache_defer_req()
> > > > > > failed,
> > > > > > so it
> > > > > > is going to be rare, but it could happen on a congested
> > > > > > system.
> > > > > > 
> > > > > > AFAICS, svc_defer() does _not_ save rqstp->rq_xprt_ctxt, so
> > > > > > svc_deferred_recv() won't restore it either.
> > > > > 
> > > > > True, but TCP and UDP both use rq_xprt_ctxt, so wouldn't we
> > > > > have
> > > > > seen this problem before on a socket transport?
> > > > 
> > > > TCP does not set rq_xprt_ctxt, and nobody really uses UDP these
> > > > days.
> > > > 
> > > > > I need to audit code to see if saving rq_xprt_ctxt in
> > > > > svc_defer()
> > > > > is safe and reasonable to do. Maybe Bruce has a thought.
> > > > 
> > > > It should be safe for the UDP case, AFAICS. I have no opinion
> > > > as of
> > > > yet
> > > > about how safe it is to do with RDMA.
> > > 
> > > It's plausible that a deferred request could be replayed, but I
> > > don't
> > > understand the deferral mechanism enough to know whether the
> > > rctxt
> > > would be released before the deferred request could be handled.
> > > It
> > > doesn't look like it would, but I could misunderstand something.
> > > 
> > > There's a longstanding testing gap here: None of my test
> > > workloads
> > > appear to force a request deferral. I don't recall Bruce having
> > > such a test either.
> > > 
> > > It would be nice if we had something that could force the use of
> > > the
> > > deferral path, like a command line option for mountd that would
> > > cause
> > > it to sleep for several seconds before responding to an upcall.
> > > It
> > > might also be done with the kernel's fault injector.
> > 
> > You just need some mechanism that causes svc_get_next_xprt() to set
> > rqstp->rq_chandle.thread_wait to the value '0'.
> 
> I gotta ask: Why does cache_defer_req() check if thread_wait is zero?
> Is there some code in the kernel now that sets it to zero? I see only
> these two values:
> 
>  784         if (!test_bit(SP_CONGESTED, &pool->sp_flags))
>  785                 rqstp->rq_chandle.thread_wait = 5*HZ;
>  786         else
>  787                 rqstp->rq_chandle.thread_wait = 1*HZ;

That test goes all the way back to commit f16b6e8d838b ("sunrpc/cache:
allow threads to block while waiting for cache update."), so probably
more of a question for Neil.
I don't see any immediate problems with removing it: even if someone
does set a value of 0, the wait code itself should cope.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx