Re: [PATCH v2 0/9] SUNRPC service thread scheduler optimizations

Chuck Lever <cel@xxxxxxxxxx> · Tue, 4 Jul 2023 22:43:45 -0400

On Wed, Jul 05, 2023 at 11:08:32AM +1000, NeilBrown wrote:
> 
> I've been pondering this scheduling mechanism in sunrpc/svc some more,
> and I wonder if rather than optimising the search, we should eliminate
> it.
> 
> Instead we could have a linked list of idle threads using llist.h
> 
> svc_enqueue_xprt calls llist_del_first() and if the result is not NULL,
> that thread is deemed busy (because it isn't on the list) and is woken.
> 
> choose_victim() could also use llist_del_first().  If nothing is there
> it could set a flag which gets cleared by the next thread to go idle.
> That thread exits ..  or something.  Some interlock would be needed but
> it shouldn't be too hard.
> 
> svc_exit_thread would have difficulty removing itself from the idle
> list, if it wasn't busy..  Possibly we could disallow that case (I think
> sending a signal to a thread can make it exit while idle).
> Alternately we could use llist_del_all() to clear the list, then wake
> them all up so that they go back on the list if there is nothing to do
> and if they aren't trying to exit.  That is fairly heavy handed, but
> isn't a case that we need to optimise.
> 
> If you think that might be worth pursuing, I could have a go at writing
> the patch - probably on top of all the cleanups in your series before
> the xarray is added.

The thread pool is effectively a cached resource, so it is a use case
that fits llist well. svcrdma uses llist in a few spots in that very
capacity.

If you think you can meet all of the criteria in the table at the top of
llist.h so thread scheduling works entirely without a lock, that might
be an interesting point of comparison.

My only concern is that the current set of candidate mechanisms manage
to use mostly the first thread, rather than round-robining through the
thread list. Using mostly one process tends to be more cache-friendly.
An llist-based thread scheduler should try to follow that behavior,
IMO.

> I also wonder if we should avoid waking too many threads up at once.
> If multiple events happen in quick succession, we currently wake up
> multiple threads and if there is any scheduling delay (which is expected
> based on Commit 22700f3c6df5 ("SUNRPC: Improve ordering of transport processing"))
> then by the time the threads wake up, there may no longer be work to do
> as another thread might have gone idle and taken the work.

It might be valuable to add some observability of wake-ups that find
nothing to do. I'll look into that.

> Instead we could have a limit on the number of threads waking up -
> possibly 1 or 3.  If the counter is maxed out, don't do a wake up.
> When a thread wakes up, it decrements the counter, dequeues some work,
> and if there is more to do, then it queues another task.

I consider reducing the wake-up rate as the next step for improving
RPC service thread scalability. Any experimentation in that area is
worth looking into.

> I imagine the same basic protocol would be used for creating new threads
> when load is high - start just one at a time, though maybe a new thread
> would handle a first request before possibly starting another thread.

I envision that dynamically tuning the pool thread count as something
that should be managed in user space, since it's a policy rather than
a mechanism.

That could be a problem, though, if we wanted to shut down a few pool
threads based on shrinker activity.