* Peter Zijlstra: > So how about we introduce new syscalls: > > sys_futex_wait(void *uaddr, unsigned long val, unsigned long flags, ktime_t *timo); > > struct futex_wait { > void *uaddr; > unsigned long val; > unsigned long flags; > }; > sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters, > unsigned long flags, ktime_t *timo); > > sys_futex_wake(void *uaddr, unsigned int nr, unsigned long flags); > > sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake, > unsigned int nr_requeue, unsigned long cmpval, unsigned long flags); > > Where flags: > > - has 2 bits for size: 8,16,32,64 > - has 2 more bits for size (requeue) ?? > - has ... bits for clocks > - has private/shared > - has numa What's the actual type of *uaddr? Does it vary by size (which I assume is in bits?)? Are there alignment constraints? These system calls seemed to be type-polymorphic still, which is problematic for defining a really nice C interface. I would really like to have a strongly typed interface for this, with a nice struct futex wrapper type (even if it means that we need four of them). Will all architectures support all sizes? If not, how do we probe which size/flags combinations are supported? > For NUMA I propose that when NUMA_FLAG is set, uaddr-4 will be 'int > node_id', with the following semantics: > > - on WAIT, node_id is read and when 0 <= node_id <= nr_nodes, is > directly used to index into per-node hash-tables. When -1, it is > replaced by the current node_id and an smp_mb() is issued before we > load and compare the @uaddr. > > - on WAKE/REQUEUE, it is an immediate index. Does this mean the first waiter determines the NUMA index, and all future waiters use the same chain even if they are on different nodes? I think documenting this as a node index would be a mistake. It could be an arbitrary hint for locating the corresponding kernel data structures. > Any invalid value with result in EINVAL. Using uaddr-4 is slightly tricky with a 64-bit futex value, due to the need to maintain alignment and avoid padding. Thanks, Florian