On Fri, Dec 25, 2020 at 01:25:29AM -0800, Nadav Amit wrote: > From: Nadav Amit <namit@xxxxxxxxxx> > > Clearing soft-dirty through /proc/[pid]/clear_refs can cause memory > corruption as it clears the dirty-bit without acquiring the mmap_lock > for write and defers TLB flushes. > > As a result of this behavior, it is possible that one of the CPUs would > have the stale PTE cached in its TLB and keep updating the page while > another thread triggers a page-fault, and the page-fault handler would > copy the old page into a new one. [...] > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 217aa2705d5d..39b2bd27af79 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -1189,6 +1189,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, > struct mm_struct *mm; > struct vm_area_struct *vma; > enum clear_refs_types type; > + bool write_lock = false; > struct mmu_gather tlb; > int itype; > int rv; > @@ -1236,21 +1237,16 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, > } > tlb_gather_mmu(&tlb, mm, 0, -1); > if (type == CLEAR_REFS_SOFT_DIRTY) { > + mmap_read_unlock(mm); > + if (mmap_write_lock_killable(mm)) { > + count = -EINTR; > + goto out_mm; > + } > for (vma = mm->mmap; vma; vma = vma->vm_next) { > - if (!(vma->vm_flags & VM_SOFTDIRTY)) > - continue; > - mmap_read_unlock(mm); > - if (mmap_write_lock_killable(mm)) { > - count = -EINTR; > - goto out_mm; > - } > - for (vma = mm->mmap; vma; vma = vma->vm_next) { > - vma->vm_flags &= ~VM_SOFTDIRTY; > - vma_set_page_prot(vma); > - } > - mmap_write_downgrade(mm); > - break; > + vma->vm_flags &= ~VM_SOFTDIRTY; > + vma_set_page_prot(vma); > } > + write_lock = true; > > mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY, > 0, NULL, mm, 0, -1UL); > @@ -1261,7 +1257,10 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, > if (type == CLEAR_REFS_SOFT_DIRTY) > mmu_notifier_invalidate_range_end(&range); > tlb_finish_mmu(&tlb, 0, -1); > - mmap_read_unlock(mm); > + if (write_lock) > + mmap_write_unlock(mm); > + else > + mmap_read_unlock(mm); > out_mm: > mmput(mm); I probably wouldn't bother with the 'write_lock' variable, and just check 'type == CLEAR_REFS_SOFT_DIRTY' instead. But that's trivial and I don't have strong opinions, so: Acked-by: Will Deacon <will@xxxxxxxxxx> Are you intending to land this for 5.11? If so, I can just rebase my other series on top of this. Will