On Tue, Jun 29, 2021 at 08:12:12PM -0400, Peter Xu wrote: > On Tue, Jun 29, 2021 at 10:50:12AM -0700, Linus Torvalds wrote: > > On Mon, Jun 28, 2021 at 7:40 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > - /* Avoid taking write faults for known dirty pages */ > > > - if (dirty_accountable && pte_dirty(ptent) && > > > - (pte_soft_dirty(ptent) || > > > - !(vma->vm_flags & VM_SOFTDIRTY))) { > > > + if (may_avoid_write_fault(ptent, vma, cp_flags)) > > > ptent = pte_mkwrite(ptent); > > > - } > > > > Hmm. I don't think this is correct. > > > > As fat as I can tell, may_avoid_write_fault() doesn't even check if > > the vma is writable! > > > > Am I misreading it? Because I think you just made even a shared mmap > > with "mprotect(PROT_READ)" turn the pte's writable. > > > > Which is a "slight" security issue. > > > > Maybe the new code is fine, and I'm missing something. The old code > > looks strange too, which makes me think that the MM_CP_DIRTY_ACCT test > > ends up saving us and depend on VM_WRITE. But it's very much not > > obvious. > > vma_wants_writenotify() checks first VM_WRITE|VM_SHARED, otherwise > MM_CP_DIRTY_ACCT will not be set. While for anonymous vmas the newly > introduced may_avoid_write_fault() checks VM_WRITE explicitly. Sorry, this statement is unclear. It's not about anonymous or not, it's just that a hidden check against VM_WRITE is there already.. Say, below chunk of the patch: if (!(cp_flags & MM_CP_DIRTY_ACCT)) { /* Otherwise, we must have exclusive access to the page. */ if (!(vma_is_anonymous(vma) && (vma->vm_flags & VM_WRITE))) return false; if (page_count(pte_page(pte)) != 1) return false; } Should be the same as: if (!(cp_flags & MM_CP_DIRTY_ACCT)) { if (!vma_is_anonymous(vma)) return false; if (!(vma->vm_flags & VM_WRITE)) return false; if (page_count(pte_page(pte)) != 1) return false; } And since MM_CP_DIRTY_ACCT implies "VM_WRITE|VM_SHARED" all set, above should be a slightly faster version of below: /* This just never trigger if MM_CP_DIRTY_ACCT set */ if (!(vma->vm_flags & VM_WRITE)) return false; if (!(cp_flags & MM_CP_DIRTY_ACCT)) { if (!vma_is_anonymous(vma)) return false; if (page_count(pte_page(pte)) != 1) return false; } It's just that we avoid checking "vma->vm_flags & VM_WRITE" when MM_CP_DIRTY_ACCT set. Again, I think in all cases some more comment should be good indeed.. > > Agreed even if it's checked it's not straightforward. Maybe it'll be a bonus > to have a comment above may_avoid_write_fault() about it in a follow up. > > > > > And even if I _am_ missing something, I really would like a very > > obvious and direct test for "this vma is writable", ie maybe a > > > > if (!(vma->vm_flags & VM_WRITE)) > > return false; > > > > at the very top of the function. > > Yes looks okay too; I think using MM_CP_DIRTY_ACCT flag has a slight advantage > in that it checks VM_WRITE only once before calling change_protection(), rather > than doing the check for every pte even if we know it'll have the same result. > However it indeed hides the facts deeper.. > > > > > And no, "pte_dirty()" is not a reason to make something writable, it > > might have started out as a writable mapping, and we dirtied the page, > > and we made it read-only. The page stays dirty, but it shouldn't > > become writable just because of that. > > I think the dirty bit checks are only to make sure we don't need those extra > write faults. It should definitely be based on the fact that VM_WRITE being > set already. > > Thanks, > > -- > Peter Xu -- Peter Xu