Re: [PATCH] mm: Prevent racy access to tlb_flush_pending

Nadav Amit <nadav.amit@xxxxxxxxx> · Mon, 17 Jul 2017 22:11:10 -0700

Andy Lutomirski <luto@xxxxxxxxxx> wrote:

> On Mon, Jul 17, 2017 at 6:40 PM, Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
>> Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>> 
>>> On Mon, Jul 17, 2017 at 11:02 AM, Nadav Amit <namit@xxxxxxxxxx> wrote:
>>>> Setting and clearing mm->tlb_flush_pending can be performed by multiple
>>>> threads, since mmap_sem may only be acquired for read in task_numa_work.
>>>> If this happens, tlb_flush_pending may be cleared while one of the
>>>> threads still changes PTEs and batches TLB flushes.
>>>> 
>>>> As a result, TLB flushes can be skipped because the indication of
>>>> pending TLB flushes is lost, for instance due to race between
>>>> migration and change_protection_range (just as in the scenario that
>>>> caused the introduction of tlb_flush_pending).
>>>> 
>>>> The feasibility of such a scenario was confirmed by adding assertion to
>>>> check tlb_flush_pending is not set by two threads, adding artificial
>>>> latency in change_protection_range() and using sysctl to reduce
>>>> kernel.numa_balancing_scan_delay_ms.
>>> 
>>> This thing is logically a refcount.  Should it be refcount_t?
>> 
>> I don’t think so. refcount_inc() would WARN_ONCE if the counter is zero
>> before the increase, although this is a valid scenario here.
> 
> Hmm.  Maybe a refcount that starts at 1?  My point is that, if someone
> could force it to overflow, it would be bad.  Maybe this isn't worth
> worrying about.

I don’t think it is a issue. At most you can have one task_numa_work() per
core running in any given moment.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href