On Mon, Dec 21, 2020 at 2:30 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > AFAIU mprotect() is the only one who modifies the pte using the mmap write > lock. NUMA balancing is also using read mmap lock when changing pte > protections, while my understanding is mprotect() used write lock only because > it manipulates the address space itself (aka. vma layout) rather than modifying > the ptes, so it needs to. So it's ok to change the pte holding only the PTE lock, if it's a *one*way* conversion. That doesn't break the "re-check the PTE contents" model (which predates _all_ of the rest: NUMA, userfaultfd, everything - it's pretty much the original model for our page table operations, and goes back to the dark ages even before SMP and the existence of a page table lock). So for example, a COW will always create a different pte (not just because the page number itself changes - you could imagine a page getting re-used and changing back - but because it's always a RO->RW transition). So two COW operations cannot "undo" each other and fool us into thinking nothing changed. Anything that changes RW->RO - like fork(), for example - needs to take the mmap_lock. NUMA balancing should be ok wrt COW, because it doesn't do that RW->RO thing, it uses the present bit. I think that you are right that NUMA balancing itself might cause other issues, because it can cause that "pte changed and then came back" (for numa protectoipn and then a numa fault) all with just the mmap lock for reading. However, even that shouldn't matter for COW, because the write protect bit is the one that proptects the *contents* of the page, so even if NUMA balancing caused that "load original PTE, then re-check later" to succeed (despite the PTE actually changing in the middle), the _contents_ of the page cannot have changed, so COW is ok. NUMA balancing won't be making a read-only page temporarily writable. Linus