Re: Question about RDMA server...

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Mon, 4 Apr 2022 15:34:17 +0000

> On Mar 31, 2022, at 4:08 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Thu, 2022-03-31 at 18:53 +0000, Chuck Lever III wrote:
>> 
>>> On Mar 31, 2022, at 12:24 PM, Trond Myklebust
>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Thu, 2022-03-31 at 16:20 +0000, Chuck Lever III wrote:
>>>> 
>>>>> On Mar 31, 2022, at 12:15 PM, Trond Myklebust
>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> Hmm... Here's another thought. What if this were a deferred
>>>>> request
>>>>> that is being replayed after an upcall to mountd or the
>>>>> idmapper?
>>>>> It
>>>>> would mean that the synchronous wait in cache_defer_req()
>>>>> failed,
>>>>> so it
>>>>> is going to be rare, but it could happen on a congested system.
>>>>> 
>>>>> AFAICS, svc_defer() does _not_ save rqstp->rq_xprt_ctxt, so
>>>>> svc_deferred_recv() won't restore it either.
>>>> 
>>>> True, but TCP and UDP both use rq_xprt_ctxt, so wouldn't we have
>>>> seen this problem before on a socket transport?
>>> 
>>> TCP does not set rq_xprt_ctxt, and nobody really uses UDP these
>>> days.
>>> 
>>>> I need to audit code to see if saving rq_xprt_ctxt in svc_defer()
>>>> is safe and reasonable to do. Maybe Bruce has a thought.
>>> 
>>> It should be safe for the UDP case, AFAICS. I have no opinion as of
>>> yet
>>> about how safe it is to do with RDMA.
>> 
>> It's plausible that a deferred request could be replayed, but I don't
>> understand the deferral mechanism enough to know whether the rctxt
>> would be released before the deferred request could be handled. It
>> doesn't look like it would, but I could misunderstand something.
>> 
>> There's a longstanding testing gap here: None of my test workloads
>> appear to force a request deferral. I don't recall Bruce having
>> such a test either.
>> 
>> It would be nice if we had something that could force the use of the
>> deferral path, like a command line option for mountd that would cause
>> it to sleep for several seconds before responding to an upcall. It
>> might also be done with the kernel's fault injector.
> 
> You just need some mechanism that causes svc_get_next_xprt() to set
> rqstp->rq_chandle.thread_wait to the value '0'.

I gotta ask: Why does cache_defer_req() check if thread_wait is zero?
Is there some code in the kernel now that sets it to zero? I see only
these two values:

 784         if (!test_bit(SP_CONGESTED, &pool->sp_flags))
 785                 rqstp->rq_chandle.thread_wait = 5*HZ;
 786         else
 787                 rqstp->rq_chandle.thread_wait = 1*HZ;

--
Chuck Lever