On 7/14/22 06:18, André Almeida wrote: > Hi, > > futex2 is an ongoing project with the goal to create a new interface for > futex that solves ongoing issues with the current syscall. > > One of this problems is the lack of NUMA awareness for futex operations. > This RFC is aimed to gather feedback around the a NUMA interface proposal. > > * The problem > > futex has a single, global hash table to store information of current > waiters to be queried by wakers. This hash table is stored in a single > node in non-uniform machines. This means that a process running in other > nodes will have some overhead using futex, given that it will need to > access the table in a different node. > > * A solution > > For NUMA machines, it would be allocated a table per node. Processes > then would be able to use the local table to avoid sharing data with > other nodes. > > * The interface > > Userspace needs to specify which node would like to use to store/query > the futex table. The common case would be to operate on the current > node, but some cases could required to operate in another one. > > Before getting to the NUMA part, a quick recap of the syscalls interface > of futex2: > > futex_wait(void *uaddr, unsigned int val, unsigned int flags, > struct timespec *timo) > > futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags) > > struct futex_requeue { > void *uaddr; > unsigned int flags; > }; > > futex_requeue(struct futex_requeue *rq1, struct futex_requeue *rq2, > unsigned int nr_wake, unsigned int nr_requeue, > u64 cmpval, unsigned int flags) > > > As requeue already has 6 arguments, we can't add an argument for the > node ID, we need to pack it in a struct. So then we have > > struct futexX_numa { > __uX value; > __sX hint; > }; > > Where X can be 8, 16, 32 or 64 (futex2 supports variable sized futexes). > `value` is the futex value and `hint` can be -1 for the current node, or > [0, MAX_NUMA_NODES) to specify a node. Example: > > struct futex32_numa f = {.value = 0, hint = -1}; > > ... > > futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL); > > Then &f would be used as the futex address, as expected, and this would > be used for the current node. If an app is expecting to have calls from > different nodes then it should do for instance: > > struct futex32_numa f = {.value = 0, hint = 2}; > > For non-NUMA apps, a call without FUTEX_NUMA flag would just use the > first node as default. > > Feedback? Who else should I CC? Just a few questions: Do I understand correctly that notifiers won't be able to wake up waiters unless they know on which node they are waiting? Is it possible to wait on a futex on different nodes? Is it possible to wake waiters on a futex on all nodes? When a single (or N, where N is not "all") waiter is woken, which node is selected? Is there a rotation of nodes, so that nodes are not skewed in terms of notified waiters?