Re: [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be writable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jul 20, 2022, at 8:19 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:

> On 18.07.22 14:02, Nadav Amit wrote:
>> From: Nadav Amit <namit@xxxxxxxxxx>
>> 
>> Anonymous pages might have the dirty bit clear, but this should not
>> prevent mprotect from making them writable if they are exclusive.
>> Therefore, skip the test whether the page is dirty in this case.
>> 
>> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>> Cc: David Hildenbrand <david@xxxxxxxxxx>
>> Cc: Peter Xu <peterx@xxxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Will Deacon <will@xxxxxxxxxx>
>> Cc: Yu Zhao <yuzhao@xxxxxxxxxx>
>> Cc: Nick Piggin <npiggin@xxxxxxxxx>
>> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
>> ---
>> mm/mprotect.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 34c2dfb68c42..da5b9bf8204f 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>> 
>> 	VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte));
>> 
>> -	if (pte_protnone(pte) || !pte_dirty(pte))
>> +	if (pte_protnone(pte))
>> 		return false;
>> 
>> 	/* Do we need write faults for softdirty tracking? */
>> @@ -66,7 +66,8 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>> 		page = vm_normal_page(vma, addr, pte);
>> 		if (!page || !PageAnon(page) || !PageAnonExclusive(page))
>> 			return false;
>> -	}
>> +	} else if (!pte_dirty(pte))
>> +		return false;
>> 
>> 	return true;
>> }
> 
> When I wrote that code, I was wondering how often that would actually
> happen in practice -- and if we care about optimizing that. Do you have
> a gut feeling in which scenarios this would happen and if we care?
> 
> If the page is in the swapcache and was swapped out, you'd be requiring
> a writeback even though nobody modified the page and possibly isn't
> going to do so in the near future.

So here is my due diligence: I did not really encounter a scenario in which
it showed up. When I looked at your code, I assumed this was an oversight
and not a thoughtful decision. For me the issue is more of the discrepancy
between how a certain page is handled before and after it was pages out.

The way that I see it, there is a tradeoff in the way dirty-bit should
be handled:
(1) Writable-clean PTEs introduce some non-negligible overhead.
(2) Marking a PTE dirty speculatively would require a write back.

… But this tradeoff should not affect whether a PTE is writable, i.e.,
mapping the PTE as writable-clean should not cause a writeback. In other
words, if you are concerned about unnecessary writebacks, which I think is a
fair concern, then do not set the dirty-bit. In a later patch I try to avoid
TLB flushes on clean-writable entries that are write-protected.

So I do not think that the writeback you mentioned should be a real issue.
Yet if you think that using the fact that the page is not-dirty is a good
hueristics to avoid future TLB flushes (for P->NP; as I said there is a
solution for RW->RO), or if you are concerned about the cost of
vm_normal_page(), perhaps those are valid concerned (although I do not think
so).

--

[ Regarding (1): After some discussions with Peter and reading more code, I
thought at some point that perhaps avoiding having writable-clean PTE as
much as possible makes sense [*], since setting the dirty-bit costs ~550
cycles and a page fault is not a lot more than 1000. But with all the
mitigations (and after adding IBRS for retbless) page-fault entry is kind of
expensive. 

[*] At least on x86 ]




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux