Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Mon, May 24, 2021 at 3:38 AM Aneesh Kumar K.V > <aneesh.kumar@xxxxxxxxxxxxx> wrote: >> >> Avoid the above race with MOVE_PMD by holding pte ptl in mremap and waiting for >> parallel pagetable walk to finish operating on pte before updating new_pmd > > Ack on the concept. Should we worry about the below race. The window would be small CPU 1 CPU 2 CPU 3 mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one mmap_write_lock_killable() addr = old_addr lock(pmd_ptl) pmd = *old_pmd pmd_clear(old_pmd) flush_tlb_range(old_addr) lock(pte_ptl) *new_pmd = pmd unlock(pte_ptl) unlock(pmd_ptl) lock(pte_ptl) *new_addr = 10; and fills TLB with new addr and old pfn ptep_clear_flush(old_addr) old pfn is free. Stale TLB entry > > However, not so much on the patch. > > Odd whitespace change: > >> @@ -254,6 +254,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, >> if (WARN_ON_ONCE(!pmd_none(*new_pmd))) >> return false; >> >> + >> /* >> * We don't have to worry about the ordering of src and dst >> * ptlocks because exclusive mmap_lock prevents deadlock. > > And new optimization for empty pmd, which seems unrelated to the > change and should presumably be separate: That was added that we can safely do pte_lockptr() below > >> @@ -263,6 +264,10 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, >> if (new_ptl != old_ptl) >> spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); >> >> + if (pmd_none(*old_pmd)) >> + goto unlock_out; >> + >> + pte_ptl = pte_lockptr(mm, old_pmd); >> /* Clear the pmd */ >> pmd = *old_pmd; >> pmd_clear(old_pmd); > > And also, why does the above assign 'pte_ptl' without using it, when > the actual use is ten lines further down? So that we fetch the pte_ptl before we clear old_pmd. -aneesh