On Mon, Jul 19, 2021 at 9:07 AM Thierry Delisle <tdelisle@xxxxxxxxxxxx> wrote: > > > /** > > * @idle_servers_ptr: a single-linked list pointing to the list > > * of idle servers. Can be NULL. > > * > > * Readable/writable by both the kernel and the userspace: the > > * userspace adds items to the list, the kernel removes them. > > * > > * This is a single-linked list (stack): head->next->next->next->NULL. > > * "next" nodes are idle_servers_ptr fields in struct umcg_task. > > * > > * Example: > > * > > * a running worker idle server 1 idle server 2 > > * > > * struct umct_task: struct umcg_task: struct umcg_task: > > * state state state > > * api_version api_version api_version > > * ... ... ... > > * idle_servers_ptr --> head --> idle_servers_ptr --> > idle_servers_ptr --> NULL > > * ... ... ... > > * > > * > > * Due to the way struct umcg_task is aligned, idle_servers_ptr > > * is aligned at 8 byte boundary, and so has its first byte as zero > > * when it holds a valid pointer. > > * > > * When pulling idle servers from the list, the kernel marks nodes as > > * "deleted" by ORing the node value (the pointer) with 1UL atomically. > > * If a node is "deleted" (i.e. its value AND 1UL is not zero), > > * the kernel proceeds to the next node. > > * > > * The kernel checks at most [nr_cpu_ids * 2] first nodes in the list. > > * > > * It is NOT considered an error if the kernel cannot find an idle > > * server. > > * > > * The userspace is responsible for cleanup/gc (i.e. for actually > > * removing nodes marked as "deleted" from the list). > > */ > > uint64_t idle_servers_ptr; /* r/w */ > > I don't understand the reason for using this ad-hoc scheme, over using a > simple > eventfd to do the job. As I understand it, the goal here is to let > servers that > cannot find workers to run, block instead of spinning. Isn't that > exactly what > the eventfd interface is for? Latency/efficiency: on worker wakeup an idle server can be picked from the list and context-switched into synchronously, on the same CPU. Using FDs and select/poll/epoll will add extra layers of abstractions; synchronous context-switches (not yet fully implemented in UMCG) will most likely be impossible. This patchset seems much more efficient and lightweight than whatever can be built on top of FDs. > > Have you considered an idle_fd field, the kernel writes 1 to the fd when a > worker is appended to the idle_workers_ptr? Servers that don't find work can > read the fd or alternatively use select/poll/epoll. Multiple workers are > expected to share fds, either a single global fd, one fd per server, or any > other combination the scheduler may fancy. >