Re: [PATCH v6 updated 9/11] mm/mremap: Fix race between mremap and pageout

"A lneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> · Tue, 25 May 2021 14:14:09 +0530

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Mon, May 24, 2021 at 3:38 AM Aneesh Kumar K.V
> <aneesh.kumar@xxxxxxxxxxxxx> wrote:
>>
>> Avoid the above race with MOVE_PMD by holding pte ptl in mremap and waiting for
>> parallel pagetable walk to finish operating on pte before updating new_pmd
>
> Ack on the concept.

Should we worry about the below race. The window would be small

CPU 1				CPU 2					CPU 3

mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

mmap_write_lock_killable()

				addr = old_addr

lock(pmd_ptl)
pmd = *old_pmd
pmd_clear(old_pmd)
flush_tlb_range(old_addr)

lock(pte_ptl)
*new_pmd = pmd
unlock(pte_ptl)

unlock(pmd_ptl)
				lock(pte_ptl)
									*new_addr = 10; and fills
									TLB with new addr
									and old pfn

				ptep_clear_flush(old_addr)
				old pfn is free.
									Stale TLB entry

>
> However, not so much on the patch.
>
> Odd whitespace change:
>
>> @@ -254,6 +254,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
>>         if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
>>                 return false;
>>
>> +
>>         /*
>>          * We don't have to worry about the ordering of src and dst
>>          * ptlocks because exclusive mmap_lock prevents deadlock.
>
> And new optimization for empty pmd, which seems unrelated to the
> change and should presumably be separate:

That was added that we can safely do pte_lockptr() below

>
>> @@ -263,6 +264,10 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
>>         if (new_ptl != old_ptl)
>>                 spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>
>> +       if (pmd_none(*old_pmd))
>> +               goto unlock_out;
>> +
>> +       pte_ptl = pte_lockptr(mm, old_pmd);
>>         /* Clear the pmd */
>>         pmd = *old_pmd;
>>         pmd_clear(old_pmd);
>
> And also, why does the above assign 'pte_ptl' without using it, when
> the actual use is ten lines further down?

So that we fetch the pte_ptl before we clear old_pmd. 

-aneesh