Re: [patch 128/192] mm: improve mprotect(R|W) efficiency on pages referenced once

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 29, 2021 at 10:50:12AM -0700, Linus Torvalds wrote:
> On Mon, Jun 28, 2021 at 7:40 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> >
> > -                       /* Avoid taking write faults for known dirty pages */
> > -                       if (dirty_accountable && pte_dirty(ptent) &&
> > -                                       (pte_soft_dirty(ptent) ||
> > -                                        !(vma->vm_flags & VM_SOFTDIRTY))) {
> > +                       if (may_avoid_write_fault(ptent, vma, cp_flags))
> >                                 ptent = pte_mkwrite(ptent);
> > -                       }
> 
> Hmm. I don't think this is correct.
> 
> As fat as I can tell, may_avoid_write_fault() doesn't even check if
> the vma is writable!
> 
> Am I misreading it? Because I think you just made even a shared mmap
> with "mprotect(PROT_READ)" turn the pte's writable.
> 
> Which is a "slight" security issue.
> 
> Maybe the new code is fine, and I'm missing something. The old code
> looks strange too, which makes me think that the MM_CP_DIRTY_ACCT test
> ends up saving us and depend on VM_WRITE. But it's very much not
> obvious.

vma_wants_writenotify() checks first VM_WRITE|VM_SHARED, otherwise
MM_CP_DIRTY_ACCT will not be set.  While for anonymous vmas the newly
introduced may_avoid_write_fault() checks VM_WRITE explicitly.

Agreed even if it's checked it's not straightforward.  Maybe it'll be a bonus
to have a comment above may_avoid_write_fault() about it in a follow up.

> 
> And even if I _am_ missing something, I really would like a very
> obvious and direct test for "this vma is writable", ie maybe a
> 
>         if (!(vma->vm_flags & VM_WRITE))
>                 return false;
> 
> at the very top of the function.

Yes looks okay too; I think using MM_CP_DIRTY_ACCT flag has a slight advantage
in that it checks VM_WRITE only once before calling change_protection(), rather
than doing the check for every pte even if we know it'll have the same result.
However it indeed hides the facts deeper..

> 
> And no, "pte_dirty()" is not a reason to make something writable, it
> might have started out as a writable mapping, and we dirtied the page,
> and we made it read-only. The page stays dirty, but it shouldn't
> become writable just because of that.

I think the dirty bit checks are only to make sure we don't need those extra
write faults.  It should definitely be based on the fact that VM_WRITE being
set already.

Thanks,

-- 
Peter Xu




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux