Re: Potential race in TLB flush batching?

Nadav Amit <nadav.amit@xxxxxxxxx> · Tue, 11 Jul 2017 15:27:55 -0700

Mel Gorman <mgorman@xxxxxxx> wrote:

> On Tue, Jul 11, 2017 at 09:09:23PM +0100, Mel Gorman wrote:
>> On Tue, Jul 11, 2017 at 08:18:23PM +0100, Mel Gorman wrote:
>>> I don't think we should be particularly clever about this and instead just
>>> flush the full mm if there is a risk of a parallel batching of flushing is
>>> in progress resulting in a stale TLB entry being used. I think tracking mms
>>> that are currently batching would end up being costly in terms of memory,
>>> fairly complex, or both. Something like this?
>> 
>> mremap and madvise(DONTNEED) would also need to flush. Memory policies are
>> fine as a move_pages call that hits the race will simply fail to migrate
>> a page that is being freed and once migration starts, it'll be flushed so
>> a stale access has no further risk. copy_page_range should also be ok as
>> the old mm is flushed and the new mm cannot have entries yet.
> 
> Adding those results in

You are way too fast for me.

> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -637,12 +637,34 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
> 		return false;
> 
> 	/* If remote CPUs need to be flushed then defer batch the flush */
> -	if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids)
> +	if (cpumask_any_but(mm_cpumask(mm), get_cpu()) < nr_cpu_ids) {
> 		should_defer = true;
> +		mm->tlb_flush_batched = true;
> +	}

Since mm->tlb_flush_batched is set before the PTE is actually cleared, it
still seems to leave a short window for a race.

CPU0				CPU1
---- 				----
should_defer_flush
=> mm->tlb_flush_batched=true		
				flush_tlb_batched_pending (another PT)
				=> flush TLB
				=> mm->tlb_flush_batched=false
ptep_get_and_clear
...

				flush_tlb_batched_pending (batched PT)
				use the stale PTE
...
try_to_unmap_flush

IOW it seems that mm->flush_flush_batched should be set after the PTE is
cleared (and have some compiler barrier to be on the safe side).

Just to clarify - I don’t try to annoy, but I considered building and
submitting a patch based on some artifacts of a study I conducted, and this
issue drove me crazy.

One more question, please: how does elevated page count or even locking the
page help (as you mention in regard to uprobes and ksm)? Yes, the page will
not be reclaimed, but IIUC try_to_unmap is called before the reference count
is frozen, and the page lock is dropped on each iteration of the loop in
shrink_page_list. In this case, it seems to me that uprobes or ksm may still
not flush the TLB.

Thanks,
Nadav
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href