On Sun, Jul 11, 2021 at 11:29 AM Thierry Delisle <tdelisle@xxxxxxxxxxxx> wrote: > > > Let's move the discussion to the new thread. > > I'm happy to start a new thread. I'm re-responding to my last post > because many > of my questions are still unanswered. > > > + * State transitions: > > + * > > + * RUNNING => IDLE: the current RUNNING task becomes IDLE by calling > > + * sys_umcg_wait(); > > > > [...] > > > > +/** > > + * enum umcg_wait_flag - flags to pass to sys_umcg_wait > > + * @UMCG_WAIT_WAKE_ONLY: wake @self->next_tid, don't put @self to sleep; > > + * @UMCG_WF_CURRENT_CPU: wake @self->next_tid on the current CPU > > + * (use WF_CURRENT_CPU); @UMCG_WAIT_WAKE_ONLY > must be set. > > + */ > > +enum umcg_wait_flag { > > + UMCG_WAIT_WAKE_ONLY = 1, > > + UMCG_WF_CURRENT_CPU = 2, > > +}; > > What is the purpose of using sys_umcg_wait without next_tid or with > UMCG_WAIT_WAKE_ONLY? It looks like Java's park/unpark semantics to me, > that is > worker threads can use this for synchronization and mutual exclusion. In > this > case, how do these compare to using FUTEX_WAIT/FUTEX_WAKE? sys_umcg_wait without next_tid puts the task in UMCG_IDLE state; wake wakes it. These are standard sched operations. If they are emulated via futexes, fast context switching will require something like FUTEX_SWAP that was NACKed last year. > > > > +struct umcg_task { > > [...] > > + /** > > + * @server_tid: the TID of the server UMCG task that should be > > + * woken when this WORKER becomes BLOCKED. Can be zero. > > + * > > + * If this is a UMCG server, @server_tid should > > + * contain the TID of @self - it will be used to find > > + * the task_struct to wake when pulled from > > + * @idle_servers. > > + * > > + * Read-only for the kernel, read/write for the userspace. > > + */ > > + uint32_t server_tid; /* r */ > > [...] > > + /** > > + * @idle_servers_ptr: a single-linked list pointing to the list > > + * of idle servers. Can be NULL. > > + * > > + * Readable/writable by both the kernel and the userspace: the > > + * userspace adds items to the list, the kernel removes them. > > + * > > + * TODO: describe how the list works. > > + */ > > + uint64_t idle_servers_ptr; /* r/w */ > > [...] > > +} __attribute__((packed, aligned(8 * sizeof(__u64)))); > > From the comments and by elimination, I'm guessing that idle_servers_ptr is > somehow used by servers to block until some worker threads become idle. > However, > I do not understand how the userspace is expected to use it. I also do not > understand if these link fields form a stack or a queue and where is the > head. When a server has nothing to do (no work to run), it is put into IDLE state and added to the list. The kernel wakes an IDLE server if a blocked worker unblocks. > > > > +/** > > + * sys_umcg_ctl: (un)register a task as a UMCG task. > > + * @flags: ORed values from enum umcg_ctl_flag; see below; > > + * @self: a pointer to struct umcg_task that describes this > > + * task and governs the behavior of sys_umcg_wait if > > + * registering; must be NULL if unregistering. > > + * > > + * @flags & UMCG_CTL_REGISTER: register a UMCG task: > > + * UMCG workers: > > + * - self->state must be UMCG_TASK_IDLE > > + * - @flags & UMCG_CTL_WORKER > > + * > > + * If the conditions above are met, sys_umcg_ctl() > immediately returns > > + * if the registered task is a RUNNING server or basic task; > an IDLE > > + * worker will be added to idle_workers_ptr, and the worker > put to > > + * sleep; an idle server from idle_servers_ptr will be > woken, if any. > > This approach to creating UMCG workers concerns me a little. My > understanding > is that in general, the number of servers controls the amount of parallelism > in the program. But in the case of creating new UMCG workers, the new > threads > only respect the M:N threading model after sys_umcg_ctl has blocked. > What does > this mean for applications that create thousands of short lived tasks? Are > users expcted to create pools of reusable UMCG workers? Yes: task/thread creation is not as lightweight as just posting work items onto a preexisting pool of workers. > > > I would suggest adding at least one uint64_t field to the struct > umcg_task that > is left as-is by the kernel. This allows implementers of user-space > schedulers to add scheduler specific data structures to the threads without > needing some kind of table on the side. This is usually achieved by embedding the kernel struct into a larger userspace/TLS struct. For example: struct umcg_task_user { struct umcg_task umcg_task; extra_user_data d1; extra_user_ptr p1; /* etc. */ } __aligned(...);