[RFC] futex2: add NUMA awareness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

futex2 is an ongoing project with the goal to create a new interface for
futex that solves ongoing issues with the current syscall.

One of this problems is the lack of NUMA awareness for futex operations.
This RFC is aimed to gather feedback around the a NUMA interface proposal.

 * The problem

futex has a single, global hash table to store information of current
waiters to be queried by wakers. This hash table is stored in a single
node in non-uniform machines. This means that a process running in other
nodes will have some overhead using futex, given that it will need to
access the table in a different node.

 * A solution

For NUMA machines, it would be allocated a table per node. Processes
then would be able to use the local table to avoid sharing data with
other nodes.

 * The interface

Userspace needs to specify which node would like to use to store/query
the futex table. The common case would be to operate on the current
node, but some cases could required to operate in another one.

Before getting to the NUMA part, a quick recap of the syscalls interface
of futex2:

futex_wait(void *uaddr, unsigned int val, unsigned int flags,
           struct timespec *timo)

futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)

struct futex_requeue {
	void *uaddr;
	unsigned int flags;
};

futex_requeue(struct futex_requeue *rq1, struct futex_requeue *rq2,
	      unsigned int nr_wake, unsigned int nr_requeue,
	      u64 cmpval, unsigned int flags)


As requeue already has 6 arguments, we can't add an argument for the
node ID, we need to pack it in a struct. So then we have

struct futexX_numa {
        __uX value;
        __sX hint;
};

Where X can be 8, 16, 32 or 64 (futex2 supports variable sized futexes).
`value` is the futex value and `hint` can be -1 for the current node, or
[0, MAX_NUMA_NODES) to specify a node. Example:

struct futex32_numa f = {.value = 0, hint = -1};

...

futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);

Then &f would be used as the futex address, as expected, and this would
be used for the current node. If an app is expecting to have calls from
different nodes then it should do for instance:

struct futex32_numa f = {.value = 0, hint = 2};

For non-NUMA apps, a call without FUTEX_NUMA flag would just use the
first node as default.

Feedback? Who else should I CC?

Thanks,
	André



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux