Hi Thierry! Again, it seems you are ascribing higher level semantics to lower level constructs here, specifically to "idle workers list". In this patchset, idle workers list is just a mechanism used by the kernel to notify the userspace that previously blocked workers are now available for scheduling. Maybe a better name would have been "unblocked workers list". The true list of idle workers that the userspace scheduler can schedule is maintained by the userspace; it can be said that if a worker is on the kernel's idle workers list, it is NOT on the userspace's idle workers list, and so workers on the kernel's idle workers list are not yet fully "idle workers" that can be scheduled. On Tue, Oct 12, 2021 at 11:46 AM Thierry Delisle <tdelisle@xxxxxxxxxxxx> wrote: > > >> Just to be clear, sys_umcg_wait supports an operation that, when called > >> from a worker, puts the worker to sleep without triggering block > detection > >> or context-switching back to the server? > > > > Potentially, yes - when a worker wants to yield (e.g. as part of a > > custom UMCG-aware mutex/condvar code), and calls into the userspace > > scheduler, it may be faster to skip the server wakeup (e.g. reassign > > the server to another sleeping worker and wake this worker). This is > > not a supported operation right now, but I see how it could be used to > > optimize some things in the future. > > > > Do you have any concerns here? > > To be honest, I did not realize this was a possibility until your previous > email. I'm not sure I buy your example, it just sounds like worker to worker > context-switching, but I could imagine "stop the world" cases or some "race > to idle" policy using this feature. > > It seems to me the corresponding wake needs to know if it needs to enqueue > the worker into the idle workers list or if it should just schedule the > worker > as it would a server. > > How does the wake know which to do? If the worker is IDLE, and the userspace knows about it (i.e. the worker is NOT on the kernel's idle workers list), the userspace either can directly schedule the worker with a server (mark it RUNNING, assign a server, etc.), or instruct the kernel to put the worker onto the kernel's idle workers list so that it can later be picked up by a userspace thread checking the kernel's idle workers list. This last operation is provided for cases when for example a server wants to run IDLE worker A and knows about IDLE worker B, but for some reason cannot properly process worker B at the moment, so by putting worker B back into the kernel's idle worker list this server delegates processing worker B to another server in the future. Both of these state transitions are explicitly covered in the documentation. Again, you appear to be trying to equate kernel UMCG API with higher level userspace scheduling notions/facilities, while in reality kernel UMCG API is a low level facility that will be _used_ to construct the higher level behaviors/semantics you are wondering about. I suggest you wait until I post the userspace library to ask these higher-level semantic questions. Thanks, Peter [...]