Re: [patch 128/192] mm: improve mprotect(R|W) efficiency on pages referenced once

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 29, 2021 at 08:12:12PM -0400, Peter Xu wrote:
> On Tue, Jun 29, 2021 at 10:50:12AM -0700, Linus Torvalds wrote:
> > On Mon, Jun 28, 2021 at 7:40 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > >
> > > -                       /* Avoid taking write faults for known dirty pages */
> > > -                       if (dirty_accountable && pte_dirty(ptent) &&
> > > -                                       (pte_soft_dirty(ptent) ||
> > > -                                        !(vma->vm_flags & VM_SOFTDIRTY))) {
> > > +                       if (may_avoid_write_fault(ptent, vma, cp_flags))
> > >                                 ptent = pte_mkwrite(ptent);
> > > -                       }
> > 
> > Hmm. I don't think this is correct.
> > 
> > As fat as I can tell, may_avoid_write_fault() doesn't even check if
> > the vma is writable!
> > 
> > Am I misreading it? Because I think you just made even a shared mmap
> > with "mprotect(PROT_READ)" turn the pte's writable.
> > 
> > Which is a "slight" security issue.
> > 
> > Maybe the new code is fine, and I'm missing something. The old code
> > looks strange too, which makes me think that the MM_CP_DIRTY_ACCT test
> > ends up saving us and depend on VM_WRITE. But it's very much not
> > obvious.
> 
> vma_wants_writenotify() checks first VM_WRITE|VM_SHARED, otherwise
> MM_CP_DIRTY_ACCT will not be set.  While for anonymous vmas the newly
> introduced may_avoid_write_fault() checks VM_WRITE explicitly.

Sorry, this statement is unclear.  It's not about anonymous or not, it's just
that a hidden check against VM_WRITE is there already..

Say, below chunk of the patch:

 	if (!(cp_flags & MM_CP_DIRTY_ACCT)) {
 		/* Otherwise, we must have exclusive access to the page. */
 		if (!(vma_is_anonymous(vma) && (vma->vm_flags & VM_WRITE)))
 			return false;
 
 		if (page_count(pte_page(pte)) != 1)
 			return false;
 	}

Should be the same as:

 	if (!(cp_flags & MM_CP_DIRTY_ACCT)) {
 		if (!vma_is_anonymous(vma))
 			return false;

                if (!(vma->vm_flags & VM_WRITE))
                        return false;
 
 		if (page_count(pte_page(pte)) != 1)
 			return false;
 	}

And since MM_CP_DIRTY_ACCT implies "VM_WRITE|VM_SHARED" all set, above should
be a slightly faster version of below:

        /* This just never trigger if MM_CP_DIRTY_ACCT set */
        if (!(vma->vm_flags & VM_WRITE))
                return false;
 
 	if (!(cp_flags & MM_CP_DIRTY_ACCT)) {
 		if (!vma_is_anonymous(vma))
 			return false;

 		if (page_count(pte_page(pte)) != 1)
 			return false;
 	}

It's just that we avoid checking "vma->vm_flags & VM_WRITE" when
MM_CP_DIRTY_ACCT set.

Again, I think in all cases some more comment should be good indeed..

> 
> Agreed even if it's checked it's not straightforward.  Maybe it'll be a bonus
> to have a comment above may_avoid_write_fault() about it in a follow up.
> 
> > 
> > And even if I _am_ missing something, I really would like a very
> > obvious and direct test for "this vma is writable", ie maybe a
> > 
> >         if (!(vma->vm_flags & VM_WRITE))
> >                 return false;
> > 
> > at the very top of the function.
> 
> Yes looks okay too; I think using MM_CP_DIRTY_ACCT flag has a slight advantage
> in that it checks VM_WRITE only once before calling change_protection(), rather
> than doing the check for every pte even if we know it'll have the same result.
> However it indeed hides the facts deeper..
> 
> > 
> > And no, "pte_dirty()" is not a reason to make something writable, it
> > might have started out as a writable mapping, and we dirtied the page,
> > and we made it read-only. The page stays dirty, but it shouldn't
> > become writable just because of that.
> 
> I think the dirty bit checks are only to make sure we don't need those extra
> write faults.  It should definitely be based on the fact that VM_WRITE being
> set already.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux