Re: [RFC] futex2: add NUMA awareness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/14/22 06:18, André Almeida wrote:
> Hi,
> 
> futex2 is an ongoing project with the goal to create a new interface for
> futex that solves ongoing issues with the current syscall.
> 
> One of this problems is the lack of NUMA awareness for futex operations.
> This RFC is aimed to gather feedback around the a NUMA interface proposal.
> 
>  * The problem
> 
> futex has a single, global hash table to store information of current
> waiters to be queried by wakers. This hash table is stored in a single
> node in non-uniform machines. This means that a process running in other
> nodes will have some overhead using futex, given that it will need to
> access the table in a different node.
> 
>  * A solution
> 
> For NUMA machines, it would be allocated a table per node. Processes
> then would be able to use the local table to avoid sharing data with
> other nodes.
> 
>  * The interface
> 
> Userspace needs to specify which node would like to use to store/query
> the futex table. The common case would be to operate on the current
> node, but some cases could required to operate in another one.
> 
> Before getting to the NUMA part, a quick recap of the syscalls interface
> of futex2:
> 
> futex_wait(void *uaddr, unsigned int val, unsigned int flags,
>            struct timespec *timo)
> 
> futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)
> 
> struct futex_requeue {
> 	void *uaddr;
> 	unsigned int flags;
> };
> 
> futex_requeue(struct futex_requeue *rq1, struct futex_requeue *rq2,
> 	      unsigned int nr_wake, unsigned int nr_requeue,
> 	      u64 cmpval, unsigned int flags)
> 
> 
> As requeue already has 6 arguments, we can't add an argument for the
> node ID, we need to pack it in a struct. So then we have
> 
> struct futexX_numa {
>         __uX value;
>         __sX hint;
> };
> 
> Where X can be 8, 16, 32 or 64 (futex2 supports variable sized futexes).
> `value` is the futex value and `hint` can be -1 for the current node, or
> [0, MAX_NUMA_NODES) to specify a node. Example:
> 
> struct futex32_numa f = {.value = 0, hint = -1};
> 
> ...
> 
> futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
> 
> Then &f would be used as the futex address, as expected, and this would
> be used for the current node. If an app is expecting to have calls from
> different nodes then it should do for instance:
> 
> struct futex32_numa f = {.value = 0, hint = 2};
> 
> For non-NUMA apps, a call without FUTEX_NUMA flag would just use the
> first node as default.
> 
> Feedback? Who else should I CC?

Just a few questions:

Do I understand correctly that notifiers won't be able to wake up
waiters unless they know on which node they are waiting?

Is it possible to wait on a futex on different nodes?

Is it possible to wake waiters on a futex on all nodes? When a single
(or N, where N is not "all") waiter is woken, which node is selected? Is
there a rotation of nodes, so that nodes are not skewed in terms of
notified waiters?



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux