Re: [PATCH] Fix: membarrier: racy access to p->mm in membarrier_global_expedited()

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Mon, 28 Jan 2019 17:46:16 -0500 (EST)

----- On Jan 28, 2019, at 5:39 PM, paulmck paulmck@xxxxxxxxxxxxx wrote:

> On Mon, Jan 28, 2019 at 05:07:07PM -0500, Mathieu Desnoyers wrote:
>> Jann Horn identified a racy access to p->mm in the global expedited
>> command of the membarrier system call.
>> 
>> The suggested fix is to hold the task_lock() around the accesses to
>> p->mm and to the mm_struct membarrier_state field to guarantee the
>> existence of the mm_struct.
>> 
>> Link:
>> https://lore.kernel.org/lkml/CAG48ez2G8ctF8dHS42TF37pThfr3y0RNOOYTmxvACm4u8Yu3cw@xxxxxxxxxxxxxx
>> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
>> Tested-by: Jann Horn <jannh@xxxxxxxxxx>
>> CC: Jann Horn <jannh@xxxxxxxxxx>
>> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> CC: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>> CC: Ingo Molnar <mingo@xxxxxxxxxx>
>> CC: Andrea Parri <parri.andrea@xxxxxxxxx>
>> CC: Andy Lutomirski <luto@xxxxxxxxxx>
>> CC: Avi Kivity <avi@xxxxxxxxxxxx>
>> CC: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
>> CC: Boqun Feng <boqun.feng@xxxxxxxxx>
>> CC: Dave Watson <davejwatson@xxxxxx>
>> CC: David Sehr <sehr@xxxxxxxxxx>
>> CC: H. Peter Anvin <hpa@xxxxxxxxx>
>> CC: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> CC: Maged Michael <maged.michael@xxxxxxxxx>
>> CC: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
>> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>> CC: Paul Mackerras <paulus@xxxxxxxxx>
>> CC: Russell King <linux@xxxxxxxxxxxxxxx>
>> CC: Will Deacon <will.deacon@xxxxxxx>
>> CC: stable@xxxxxxxxxxxxxxx # v4.16+
>> CC: linux-api@xxxxxxxxxxxxxxx
>> ---
>>  kernel/sched/membarrier.c | 27 +++++++++++++++++++++------
>>  1 file changed, 21 insertions(+), 6 deletions(-)
>> 
>> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
>> index 76e0eaf4654e..305fdcc4c5f7 100644
>> --- a/kernel/sched/membarrier.c
>> +++ b/kernel/sched/membarrier.c
>> @@ -81,12 +81,27 @@ static int membarrier_global_expedited(void)
>> 
>>  		rcu_read_lock();
>>  		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>> -		if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
>> -				   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
>> -			if (!fallback)
>> -				__cpumask_set_cpu(cpu, tmpmask);
>> -			else
>> -				smp_call_function_single(cpu, ipi_mb, NULL, 1);
>> +		/*
>> +		 * Skip this CPU if the runqueue's current task is NULL or if
>> +		 * it is a kernel thread.
>> +		 */
>> +		if (p && READ_ONCE(p->mm)) {
>> +			bool mm_match;
>> +
>> +			/*
>> +			 * Read p->mm and access membarrier_state while holding
>> +			 * the task lock to ensure existence of mm.
>> +			 */
>> +			task_lock(p);
>> +			mm_match = p->mm && (atomic_read(&p->mm->membarrier_state) &
> 
> Are we guaranteed that this p->mm will be the same as the one loaded via
> READ_ONCE() above?  Either way, wouldn't it be better to READ_ONCE() it a
> single time and use the same value everywhere?

The first "READ_ONCE()" above is _outside_ of the task_lock() critical section.
Those two accesses _can_ load two different pointers, and this is why we
need to re-read the p->mm pointer within the task_lock() critical section to
ensure existence of the mm_struct that we use.

If we move the READ_ONCE() into the task_lock(), we need to uselessly
take a lock before we can skip kernel threads.

If we lead the READ_ONCE() outside the task_lock(), then p->mm can be updated
between the READ_ONCE() and reference to the mm_struct content within the
task_lock(), which is racy and does not guarantee its existence.

Or am I missing your point ?

Thanks,

Mathieu

> 
>							Thanx, Paul
> 
>> +					     MEMBARRIER_STATE_GLOBAL_EXPEDITED);
>> +			task_unlock(p);
>> +			if (mm_match) {
>> +				if (!fallback)
>> +					__cpumask_set_cpu(cpu, tmpmask);
>> +				else
>> +					smp_call_function_single(cpu, ipi_mb, NULL, 1);
>> +			}
>>  		}
>>  		rcu_read_unlock();
>>  	}
>> --
>> 2.17.1

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com