Re: [PATCH] mm/mempolicy: Fix redundant check and refine lock protection scope in init_nodemask_of_mempolicy

Gregory Price <gourry@xxxxxxxxxx> · Thu, 21 Nov 2024 09:41:21 -0500

On Thu, Nov 21, 2024 at 02:40:44PM +0100, David Hildenbrand wrote:
> On 21.11.24 13:24, Zhen Ni wrote:
> > 1.Removing redundant checks for current->mempolicy, with a more concise
> > check order.
> > 
> > 2.Using READ_ONCE(current->mempolicy) for safe, single access to
> > current->mempolicy to prevent potential race conditions.
> > 
> > 3.Optimizing the scope of task_lock(current). The lock now only protects
> > the critical section where mempolicy is accessed, reducing the duration
> > the lock is held. This enhances performance by limiting the scope of the
> > lock to only what is necessary.
> >

While optimizing task lock scopes is admirable, we're talking about a scale
of maybe a microsecond length access on a lock that's typically held quite
a bit longer - if it's ever really held given the context of this function.

What you'd actually want to do is the opposite:

task_lock(current);
mempolicy = get_mempolicy();
task_unlock(current);
/* do work */
put_mempolicy();

but even this is only optimizing a small amount of work.

And if you look at what init_nodemask_of_mempolicy is actually used for,
there is no need for this optimization - the task is almost certainly
short lived and intended to just adjust a system control value.

> > Signed-off-by: Zhen Ni <zhen.ni@xxxxxxxxxxxx>
> > ---
> >   mm/mempolicy.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index b646fab3e45e..8bff8830b7e6 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2132,11 +2132,14 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
> >   {
> >   	struct mempolicy *mempolicy;
> > -	if (!(mask && current->mempolicy))
> > +	if (!mask)> +		return false;
> > +
> > +	mempolicy = READ_ONCE(current->mempolicy);
> > +	if (!mempolicy)
> >   		return false;
> >   	task_lock(current);
> > -	mempolicy = current->mempolicy;
> >   	switch (mempolicy->mode) {
> >   	case MPOL_PREFERRED:
> >   	case MPOL_PREFERRED_MANY:
> 
> (a) current->mempolicy can only change under the task_lock(), see
> do_set_mempolicy().
> 
> (b) current->mempolicy cannot usually NULL once it was none-NULL. There are
> two exceptions:
> 
> copy_process() might set it to NULL when creating the process, before the
> task gets scheduled.
> 
> do_exit() might set it to NULL via mpol_put_task_policy().
> 
> I don't think init_nodemask_of_mempolicy() can be called while our task is
> exiting ...
>

Unless someone put an access to nr_hugepages_mempolicy in the exit path, I
doubt it (also that would probably deadlock :])

> 
> Now, if you read current->mempolicy outside of task_lock() to the
> *dereference* it when it might already have changed+was freed, that's a BUG
> you're introducing.
> 

100% a bug, although since the task mempolicy can't currently change outside
the context of the task - I can't think of a a way poke this particular bug.

~Gregory