On Wed, Jul 05, 2023 at 11:08:32AM +1000, NeilBrown wrote: > > I've been pondering this scheduling mechanism in sunrpc/svc some more, > and I wonder if rather than optimising the search, we should eliminate > it. > > Instead we could have a linked list of idle threads using llist.h > > svc_enqueue_xprt calls llist_del_first() and if the result is not NULL, > that thread is deemed busy (because it isn't on the list) and is woken. > > choose_victim() could also use llist_del_first(). If nothing is there > it could set a flag which gets cleared by the next thread to go idle. > That thread exits .. or something. Some interlock would be needed but > it shouldn't be too hard. > > svc_exit_thread would have difficulty removing itself from the idle > list, if it wasn't busy.. Possibly we could disallow that case (I think > sending a signal to a thread can make it exit while idle). > Alternately we could use llist_del_all() to clear the list, then wake > them all up so that they go back on the list if there is nothing to do > and if they aren't trying to exit. That is fairly heavy handed, but > isn't a case that we need to optimise. > > If you think that might be worth pursuing, I could have a go at writing > the patch - probably on top of all the cleanups in your series before > the xarray is added. The thread pool is effectively a cached resource, so it is a use case that fits llist well. svcrdma uses llist in a few spots in that very capacity. If you think you can meet all of the criteria in the table at the top of llist.h so thread scheduling works entirely without a lock, that might be an interesting point of comparison. My only concern is that the current set of candidate mechanisms manage to use mostly the first thread, rather than round-robining through the thread list. Using mostly one process tends to be more cache-friendly. An llist-based thread scheduler should try to follow that behavior, IMO. > I also wonder if we should avoid waking too many threads up at once. > If multiple events happen in quick succession, we currently wake up > multiple threads and if there is any scheduling delay (which is expected > based on Commit 22700f3c6df5 ("SUNRPC: Improve ordering of transport processing")) > then by the time the threads wake up, there may no longer be work to do > as another thread might have gone idle and taken the work. It might be valuable to add some observability of wake-ups that find nothing to do. I'll look into that. > Instead we could have a limit on the number of threads waking up - > possibly 1 or 3. If the counter is maxed out, don't do a wake up. > When a thread wakes up, it decrements the counter, dequeues some work, > and if there is more to do, then it queues another task. I consider reducing the wake-up rate as the next step for improving RPC service thread scalability. Any experimentation in that area is worth looking into. > I imagine the same basic protocol would be used for creating new threads > when load is high - start just one at a time, though maybe a new thread > would handle a first request before possibly starting another thread. I envision that dynamically tuning the pool thread count as something that should be managed in user space, since it's a policy rather than a mechanism. That could be a problem, though, if we wanted to shut down a few pool threads based on shrinker activity.