On Mon, Nov 29, 2021 at 1:08 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: [...] > > > > Another big concern I have is that you removed UMCG_TF_LOCKED. I > > > > > > OOh yes, I forgot to mention that. I couldn't figure out what it was > > > supposed to do. [...] > > So then A does: > > A::next_tid = C.tid; > sys_umcg_wait(); > > Which will: > > pin(A); > pin(S0); > > cmpxchg(A::state, RUNNING, RUNNABLE); Hmm.... That's another difference between your patch and mine: my approach was "the side that initiates the change updates the state". So in my code the userspace changes the current task's state RUNNING => RUNNABLE and the next task's state, or the server's state, RUNNABLE => RUNNING before calling sys_umcg_wait(). The kernel changed worker states to BLOCKED/RUNNABLE during block/wake detection, and marked servers RUNNING when waking them during block/wake detection; but all applicable state changes for sys_umcg_wait() happen in the userspace. The reasoning behind this approach was: - do in kernel only that which cannot be done in the userspace, to make the kernel code smaller/simpler - similar to how futexes work: futex_wait does not change the futex value to the desired value, but just checks whether the futex value matches the desired value - similar to how futexes work, concurrent state changes can happen in the userspace without calling into the kernel at all for example: - (a): worker A goes to sleep into sys_umcg_wait() - (b): worker B wants to context switch into worker A "a moment" later - due to preemption/interrupts/pagefaults/whatnot, (b) happens in reality before (a) in my patchset, the situation above happily resolves in the userspace so that worker A keeps running without ever calling sys_umcg_wait(). Again, I don't think this is deal breaking, and your approach will work, just a bit less efficiently in some cases :) I'm still not sure we can live without UMCG_TF_LOCKED. What if worker A transfers its server to worker B that A intends to context switch into, and then worker A pagefaults or gets interrupted before calling sys_umcg_wait()? The server will be woken up and will see that it is assigned to worker B; now what? If worker A is "locked" before the whole thing starts, the pagefault/interrupt will not trigger block/wake detection, worker A will keep RUNNING for all intended purposes, and eventually will call sys_umcg_wait() as it had intended... [...]