Re: [PATCH 06/12] SUNRPC: rename and refactor svc_get_next_xprt().

Chuck Lever <chuck.lever@xxxxxxxxxx> · Tue, 1 Aug 2023 18:46:21 -0400

On Mon, Jul 31, 2023 at 07:16:01PM -0400, Chuck Lever wrote:
> On Mon, Jul 31, 2023 at 04:48:33PM +1000, NeilBrown wrote:
> > svc_get_next_xprt() does a lot more than just get an xprt.  It also
> > decides if it needs to sleep, depending not only on the availability of
> > xprts, but also on the need to exit or handle external work
> > (SP_TASK_PENDING).
> > 
> > So rename it to svc_rqst_wait_and_dequeue_work(), don't return the xprt
> > (which can easily be found in rqstp->rq_xprt), and restructure to make a
> > clear separation between waiting and dequeueing.
> 
> For me, the most valuable part of this patch is the last part here:
> refactoring the dequeue and the wait, and deduplicating the dequeue.
> 
> 
> > All the scheduling-related code like try_to_freeze() and
> > kthread_should_stop() is moved into svc_rqst_wait_and_dequeue_work().
> > 
> > Rather than calling svc_xprt_dequeue() twice (before and after deciding
> > to wait), it now calls rqst_should_sleep() twice.  If the first fails,
> > we skip all the waiting code completely.  In the waiting code we call
> > again after setting the task state in case we missed a wake-up.
> > 
> > We now only have one call to try_to_freeze() and one call to
> > svc_xprt_dequeue().  We still have two calls to kthread_should_stop() -
> > one in rqst_should_sleep() to avoid sleeping, and one afterwards to
> > avoid dequeueing any work (it previously came after dequeueing which
> > doesn't seem right).
> > 
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> >  net/sunrpc/svc_xprt.c | 62 +++++++++++++++++++++----------------------
> >  1 file changed, 31 insertions(+), 31 deletions(-)
> > 
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index 380fb3caea4c..67f2b34cb8e4 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -722,47 +722,51 @@ rqst_should_sleep(struct svc_rqst *rqstp)
> >  	return true;
> >  }
> >  
> > -static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp)
> > +static void svc_rqst_wait_and_dequeue_work(struct svc_rqst *rqstp)
> 
> It would be simpler to follow if you renamed this function once
> (here), and changed directly from returning struct svc_xprt to
> returning bool.
> 
> 
> >  {
> >  	struct svc_pool		*pool = rqstp->rq_pool;
> > +	bool slept = false;
> >  
> >  	/* rq_xprt should be clear on entry */
> >  	WARN_ON_ONCE(rqstp->rq_xprt);
> >  
> > -	rqstp->rq_xprt = svc_xprt_dequeue(pool);
> > -	if (rqstp->rq_xprt) {
> > -		trace_svc_pool_polled(rqstp);
> > -		goto out_found;
> > +	if (rqst_should_sleep(rqstp)) {
> > +		set_current_state(TASK_IDLE);
> > +		smp_mb__before_atomic();
> > +		clear_bit(SP_CONGESTED, &pool->sp_flags);
> > +		clear_bit(RQ_BUSY, &rqstp->rq_flags);
> > +		smp_mb__after_atomic();
> > +
> > +		/* Need to test again after setting task state */
> 
> This comment isn't illuminating. It needs to explain the "need to
> test again".
> 
> 
> > +		if (likely(rqst_should_sleep(rqstp))) {
> 
> Is likely() still needed here?
> 
> 
> > +			schedule();
> > +			slept = true;
> > +		} else {
> > +			__set_current_state(TASK_RUNNING);
> > +			cond_resched();
> 
> This makes me happy. Only call cond_resched() if we didn't sleep.
> 
> 
> > +		}
> > +		set_bit(RQ_BUSY, &rqstp->rq_flags);
> > +		smp_mb__after_atomic();
> >  	}
> > -
> > -	set_current_state(TASK_IDLE);
> > -	smp_mb__before_atomic();
> > -	clear_bit(SP_CONGESTED, &pool->sp_flags);
> > -	clear_bit(RQ_BUSY, &rqstp->rq_flags);
> > -	smp_mb__after_atomic();
> > -
> > -	if (likely(rqst_should_sleep(rqstp)))
> > -		schedule();
> > -	else
> > -		__set_current_state(TASK_RUNNING);
> > -
> >  	try_to_freeze();
> >  
> > -	set_bit(RQ_BUSY, &rqstp->rq_flags);
> > -	smp_mb__after_atomic();
> > +	if (kthread_should_stop())
> > +		return;
> > +
> >  	clear_bit(SP_TASK_PENDING, &pool->sp_flags);
> >  	rqstp->rq_xprt = svc_xprt_dequeue(pool);
> >  	if (rqstp->rq_xprt) {
> > -		trace_svc_pool_awoken(rqstp);
> > +		if (slept)
> > +			trace_svc_pool_awoken(rqstp);
> > +		else
> > +			trace_svc_pool_polled(rqstp);
> 
> Again, it would perhaps be better if we rearranged this code first,
> and then added tracepoints. This is ... well, ugly.

I've dropped the three tracepoint patches and pushed out the changes
to topic-sunrpc-thread-scheduling . We can circle back to adding
tracepoints once this code has settled.

> >  		goto out_found;
> >  	}
> >  
> > -	if (kthread_should_stop())
> > -		return NULL;
> > -	percpu_counter_inc(&pool->sp_threads_no_work);
> > -	return NULL;
> > +	if (slept)
> > +		percpu_counter_inc(&pool->sp_threads_no_work);
> > +	return;
> >  out_found:
> > -	clear_bit(SP_TASK_PENDING, &pool->sp_flags);
> >  	/* Normally we will wait up to 5 seconds for any required
> >  	 * cache information to be provided.
> >  	 */
> > @@ -770,7 +774,6 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp)
> >  		rqstp->rq_chandle.thread_wait = 5*HZ;
> >  	else
> >  		rqstp->rq_chandle.thread_wait = 1*HZ;
> > -	return rqstp->rq_xprt;
> >  }
> >  
> >  static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt)
> > @@ -854,12 +857,9 @@ void svc_recv(struct svc_rqst *rqstp)
> >  	if (!svc_alloc_arg(rqstp))
> >  		goto out;
> >  
> > -	try_to_freeze();
> > -	cond_resched();
> > -	if (kthread_should_stop())
> > -		goto out;
> > +	svc_rqst_wait_and_dequeue_work(rqstp);
> >  
> > -	xprt = svc_get_next_xprt(rqstp);
> > +	xprt = rqstp->rq_xprt;
> >  	if (!xprt)
> >  		goto out;
> >  
> > -- 
> > 2.40.1
> > 
> 
> -- 
> Chuck Lever

-- 
Chuck Lever