On Jul 20, 2022, at 8:19 AM, David Hildenbrand <david@xxxxxxxxxx> wrote: > On 18.07.22 14:02, Nadav Amit wrote: >> From: Nadav Amit <namit@xxxxxxxxxx> >> >> Anonymous pages might have the dirty bit clear, but this should not >> prevent mprotect from making them writable if they are exclusive. >> Therefore, skip the test whether the page is dirty in this case. >> >> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> >> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> Cc: Andy Lutomirski <luto@xxxxxxxxxx> >> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> >> Cc: David Hildenbrand <david@xxxxxxxxxx> >> Cc: Peter Xu <peterx@xxxxxxxxxx> >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Will Deacon <will@xxxxxxxxxx> >> Cc: Yu Zhao <yuzhao@xxxxxxxxxx> >> Cc: Nick Piggin <npiggin@xxxxxxxxx> >> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx> >> --- >> mm/mprotect.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/mm/mprotect.c b/mm/mprotect.c >> index 34c2dfb68c42..da5b9bf8204f 100644 >> --- a/mm/mprotect.c >> +++ b/mm/mprotect.c >> @@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, >> >> VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte)); >> >> - if (pte_protnone(pte) || !pte_dirty(pte)) >> + if (pte_protnone(pte)) >> return false; >> >> /* Do we need write faults for softdirty tracking? */ >> @@ -66,7 +66,8 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, >> page = vm_normal_page(vma, addr, pte); >> if (!page || !PageAnon(page) || !PageAnonExclusive(page)) >> return false; >> - } >> + } else if (!pte_dirty(pte)) >> + return false; >> >> return true; >> } > > When I wrote that code, I was wondering how often that would actually > happen in practice -- and if we care about optimizing that. Do you have > a gut feeling in which scenarios this would happen and if we care? > > If the page is in the swapcache and was swapped out, you'd be requiring > a writeback even though nobody modified the page and possibly isn't > going to do so in the near future. So here is my due diligence: I did not really encounter a scenario in which it showed up. When I looked at your code, I assumed this was an oversight and not a thoughtful decision. For me the issue is more of the discrepancy between how a certain page is handled before and after it was pages out. The way that I see it, there is a tradeoff in the way dirty-bit should be handled: (1) Writable-clean PTEs introduce some non-negligible overhead. (2) Marking a PTE dirty speculatively would require a write back. … But this tradeoff should not affect whether a PTE is writable, i.e., mapping the PTE as writable-clean should not cause a writeback. In other words, if you are concerned about unnecessary writebacks, which I think is a fair concern, then do not set the dirty-bit. In a later patch I try to avoid TLB flushes on clean-writable entries that are write-protected. So I do not think that the writeback you mentioned should be a real issue. Yet if you think that using the fact that the page is not-dirty is a good hueristics to avoid future TLB flushes (for P->NP; as I said there is a solution for RW->RO), or if you are concerned about the cost of vm_normal_page(), perhaps those are valid concerned (although I do not think so). -- [ Regarding (1): After some discussions with Peter and reading more code, I thought at some point that perhaps avoiding having writable-clean PTE as much as possible makes sense [*], since setting the dirty-bit costs ~550 cycles and a page fault is not a lot more than 1000. But with all the mitigations (and after adding IBRS for retbless) page-fault entry is kind of expensive. [*] At least on x86 ]