On Tue, Jun 29, 2021 at 10:50:12AM -0700, Linus Torvalds wrote: > On Mon, Jun 28, 2021 at 7:40 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > - /* Avoid taking write faults for known dirty pages */ > > - if (dirty_accountable && pte_dirty(ptent) && > > - (pte_soft_dirty(ptent) || > > - !(vma->vm_flags & VM_SOFTDIRTY))) { > > + if (may_avoid_write_fault(ptent, vma, cp_flags)) > > ptent = pte_mkwrite(ptent); > > - } > > Hmm. I don't think this is correct. > > As fat as I can tell, may_avoid_write_fault() doesn't even check if > the vma is writable! > > Am I misreading it? Because I think you just made even a shared mmap > with "mprotect(PROT_READ)" turn the pte's writable. > > Which is a "slight" security issue. > > Maybe the new code is fine, and I'm missing something. The old code > looks strange too, which makes me think that the MM_CP_DIRTY_ACCT test > ends up saving us and depend on VM_WRITE. But it's very much not > obvious. vma_wants_writenotify() checks first VM_WRITE|VM_SHARED, otherwise MM_CP_DIRTY_ACCT will not be set. While for anonymous vmas the newly introduced may_avoid_write_fault() checks VM_WRITE explicitly. Agreed even if it's checked it's not straightforward. Maybe it'll be a bonus to have a comment above may_avoid_write_fault() about it in a follow up. > > And even if I _am_ missing something, I really would like a very > obvious and direct test for "this vma is writable", ie maybe a > > if (!(vma->vm_flags & VM_WRITE)) > return false; > > at the very top of the function. Yes looks okay too; I think using MM_CP_DIRTY_ACCT flag has a slight advantage in that it checks VM_WRITE only once before calling change_protection(), rather than doing the check for every pte even if we know it'll have the same result. However it indeed hides the facts deeper.. > > And no, "pte_dirty()" is not a reason to make something writable, it > might have started out as a writable mapping, and we dirtied the page, > and we made it read-only. The page stays dirty, but it shouldn't > become writable just because of that. I think the dirty bit checks are only to make sure we don't need those extra write faults. It should definitely be based on the fact that VM_WRITE being set already. Thanks, -- Peter Xu