Re: [PATCH 2/6] futex: Implement FUTEX2_NUMA

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Mon, 28 Oct 2024 10:46:18 +0100

On Fri, Oct 25, 2024 at 12:28:54PM -0700, Christoph Lameter (Ampere) wrote:
> On Fri, 25 Oct 2024, Peter Zijlstra wrote:
> 
> > Extend the futex2 interface to be numa aware.
> >
> > When FUTEX2_NUMA is specified for a futex, the user value is extended
> > to two words (of the same size). The first is the user value we all
> > know, the second one will be the node to place this futex on.
> >
> >   struct futex_numa_32 {
> > 	u32 val;
> > 	u32 node;
> >   };
> >
> > When node is set to ~0, WAIT will set it to the current node_id such
> > that WAKE knows where to find it. If userspace corrupts the node value
> > between WAIT and WAKE, the futex will not be found and no wakeup will
> > happen.
> >
> > When FUTEX2_NUMA is not set, the node is simply an extention of the
> > hash, such that traditional futexes are still interleaved over the
> > nodes.
> 
> 
> Would it be possible to follow the NUMA memory policy set up for a task
> when making these decisions? We may not need a separate FUTEX2_NUMA
> option. There are supportive functions in mm/mempolicy.c that will yield
> a node for the futex logic to use.

Using get_task_policy() seems very dangerous to me. It is explicitly
possible for different tasks in a process to have different policies,
which means (private) futexes would fail to work correctly.

We need something that is process wide consistent -- like the vma
policies. Except at current, those are to expensive to readily access.