On Wed, Jan 19, 2022 at 12:47 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Jan 18, 2022 at 10:19:21AM -0800, Peter Oskolkov wrote: > > ============= worker-to-worker context switches > > > > One example: absl::Mutex (https://abseil.io/about/design/mutex) has > > google-internal extensions that are "fiber aware". More specifically, > > consider this situation: > > > > - worker W1 acqured the mutex and is doing its work > > - worker W2 calls mutex::lock() > > mutex::lock(), being aware of workers, understands that W2 is going to sleep; > > so instead of just doing so, waking the server, and letting > > the server figure out what to run in place of the sleeping worker, > > mutex::lock() > > calls into the userspace scheduler in the context of W2 running, and the > > userspace scheduler then picks W3 to run and does W2->W3 context switch. > > > > The optimization above replaces W2->Server and Server->W3 context switches > > with a single W2->W3 context switch, which is a material performance gain. > > Yes, I've also already reconsidered. Things like pipelines and other > fixed order scheduling policies will greatly benefit from > worker-to-worker switching. > > But I think all of them are explicit. That is, we can limit the > ::next_tid usage to sys_umcg_wait() and never look at it for implicit > blocks. Yes, of course - when a worker blocks, its server gets notified. > > > In addition, when W1 calls mutex::unlock(), the scheduling code determines > > that W2 is waiting on the mutex, and thus calls W2::wake() from the context of > > running W1 (you asked earlier why do we need "WAKE_ONLY"). > > This I'm not at all convinced on. That sounds like it will violate the > 1:1 thing. wake_only is a wakeup event, meaning the worker gets added to the wake queue, not scheduled on a CPU; we don't have to implement it in the kernel, though - the userspace may keep its own wake queue for workers like this. So feel free to ignore this operation.