On Mon, Jul 31, 2023 at 07:16:01PM -0400, Chuck Lever wrote: > On Mon, Jul 31, 2023 at 04:48:33PM +1000, NeilBrown wrote: > > svc_get_next_xprt() does a lot more than just get an xprt. It also > > decides if it needs to sleep, depending not only on the availability of > > xprts, but also on the need to exit or handle external work > > (SP_TASK_PENDING). > > > > So rename it to svc_rqst_wait_and_dequeue_work(), don't return the xprt > > (which can easily be found in rqstp->rq_xprt), and restructure to make a > > clear separation between waiting and dequeueing. > > For me, the most valuable part of this patch is the last part here: > refactoring the dequeue and the wait, and deduplicating the dequeue. > > > > All the scheduling-related code like try_to_freeze() and > > kthread_should_stop() is moved into svc_rqst_wait_and_dequeue_work(). > > > > Rather than calling svc_xprt_dequeue() twice (before and after deciding > > to wait), it now calls rqst_should_sleep() twice. If the first fails, > > we skip all the waiting code completely. In the waiting code we call > > again after setting the task state in case we missed a wake-up. > > > > We now only have one call to try_to_freeze() and one call to > > svc_xprt_dequeue(). We still have two calls to kthread_should_stop() - > > one in rqst_should_sleep() to avoid sleeping, and one afterwards to > > avoid dequeueing any work (it previously came after dequeueing which > > doesn't seem right). > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > net/sunrpc/svc_xprt.c | 62 +++++++++++++++++++++---------------------- > > 1 file changed, 31 insertions(+), 31 deletions(-) > > > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > > index 380fb3caea4c..67f2b34cb8e4 100644 > > --- a/net/sunrpc/svc_xprt.c > > +++ b/net/sunrpc/svc_xprt.c > > @@ -722,47 +722,51 @@ rqst_should_sleep(struct svc_rqst *rqstp) > > return true; > > } > > > > -static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > > +static void svc_rqst_wait_and_dequeue_work(struct svc_rqst *rqstp) > > It would be simpler to follow if you renamed this function once > (here), and changed directly from returning struct svc_xprt to > returning bool. > > > > { > > struct svc_pool *pool = rqstp->rq_pool; > > + bool slept = false; > > > > /* rq_xprt should be clear on entry */ > > WARN_ON_ONCE(rqstp->rq_xprt); > > > > - rqstp->rq_xprt = svc_xprt_dequeue(pool); > > - if (rqstp->rq_xprt) { > > - trace_svc_pool_polled(rqstp); > > - goto out_found; > > + if (rqst_should_sleep(rqstp)) { > > + set_current_state(TASK_IDLE); > > + smp_mb__before_atomic(); > > + clear_bit(SP_CONGESTED, &pool->sp_flags); > > + clear_bit(RQ_BUSY, &rqstp->rq_flags); > > + smp_mb__after_atomic(); > > + > > + /* Need to test again after setting task state */ > > This comment isn't illuminating. It needs to explain the "need to > test again". > > > > + if (likely(rqst_should_sleep(rqstp))) { > > Is likely() still needed here? > > > > + schedule(); > > + slept = true; > > + } else { > > + __set_current_state(TASK_RUNNING); > > + cond_resched(); > > This makes me happy. Only call cond_resched() if we didn't sleep. > > > > + } > > + set_bit(RQ_BUSY, &rqstp->rq_flags); > > + smp_mb__after_atomic(); > > } > > - > > - set_current_state(TASK_IDLE); > > - smp_mb__before_atomic(); > > - clear_bit(SP_CONGESTED, &pool->sp_flags); > > - clear_bit(RQ_BUSY, &rqstp->rq_flags); > > - smp_mb__after_atomic(); > > - > > - if (likely(rqst_should_sleep(rqstp))) > > - schedule(); > > - else > > - __set_current_state(TASK_RUNNING); > > - > > try_to_freeze(); > > > > - set_bit(RQ_BUSY, &rqstp->rq_flags); > > - smp_mb__after_atomic(); > > + if (kthread_should_stop()) > > + return; > > + > > clear_bit(SP_TASK_PENDING, &pool->sp_flags); > > rqstp->rq_xprt = svc_xprt_dequeue(pool); > > if (rqstp->rq_xprt) { > > - trace_svc_pool_awoken(rqstp); > > + if (slept) > > + trace_svc_pool_awoken(rqstp); > > + else > > + trace_svc_pool_polled(rqstp); > > Again, it would perhaps be better if we rearranged this code first, > and then added tracepoints. This is ... well, ugly. I've dropped the three tracepoint patches and pushed out the changes to topic-sunrpc-thread-scheduling . We can circle back to adding tracepoints once this code has settled. > > goto out_found; > > } > > > > - if (kthread_should_stop()) > > - return NULL; > > - percpu_counter_inc(&pool->sp_threads_no_work); > > - return NULL; > > + if (slept) > > + percpu_counter_inc(&pool->sp_threads_no_work); > > + return; > > out_found: > > - clear_bit(SP_TASK_PENDING, &pool->sp_flags); > > /* Normally we will wait up to 5 seconds for any required > > * cache information to be provided. > > */ > > @@ -770,7 +774,6 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp) > > rqstp->rq_chandle.thread_wait = 5*HZ; > > else > > rqstp->rq_chandle.thread_wait = 1*HZ; > > - return rqstp->rq_xprt; > > } > > > > static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt) > > @@ -854,12 +857,9 @@ void svc_recv(struct svc_rqst *rqstp) > > if (!svc_alloc_arg(rqstp)) > > goto out; > > > > - try_to_freeze(); > > - cond_resched(); > > - if (kthread_should_stop()) > > - goto out; > > + svc_rqst_wait_and_dequeue_work(rqstp); > > > > - xprt = svc_get_next_xprt(rqstp); > > + xprt = rqstp->rq_xprt; > > if (!xprt) > > goto out; > > > > -- > > 2.40.1 > > > > -- > Chuck Lever -- Chuck Lever