Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 20, 2022, at 10:23 PM, Peter Xu <peterx@xxxxxxxxxx> wrote:
> 
> On Thu, Feb 17, 2022 at 08:00:12PM -0800, Nadav Amit wrote:
>> 
>>> On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
>>> 
>>>> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
>>>> 
>>>>      if (dirty_accountable && pte_dirty(ptent) &&
>>>>                      (pte_soft_dirty(ptent) ||
>>>>                      (vma->vm_flags & VM_SOFTDIRTY))) {
>>>>              ptent = pte_mkwrite(ptent);
>>>>      }
>> 
>> I know it is off-topic (not directly related to my patch), but
>> I tried to understand the logic - both of the existing code and of
>> your suggested change - and I failed.
>> 
>> IIUC dirty_accountable (whose value is taken from
>> vma_wants_writenotify()) means that the writes *should* be tracked,
>> and therefore the page should remain read-only.
> 
> Right.
> 
>> 
>> So basically the condition should have been based on
>> !dirty_accountable, i.e. the inverted value of dirty_accountable.
>> 
>> The problem is that dirty_accountable also reflects VM_SOFTDIRTY
>> considerations, so looking on the PTE does not tell you whether
>> the PTE should remain write-protected even if it is dirty.
> 
> My understanding is that the dirty bits (especially if both set) means
> we've tracked dirty on this pte already so we don't need to, hence we can
> set the dirty bit here.  E.g., continuous mprotect(RO), mprotect(RW) upon a
> full dirty pte.
> 
> When something wants to enable tracking again, it needs to clear the dirty
> bit, either the real one or soft-dirty one.  So it's a pure performance
> enhancement to conditionally set write bit here, when we're sure we won't
> need any further tracking on this pte.
> 
> One thing to mention is that this path only applies to VM_SHARED|VM_WRITE,
> because that's what checked the first in vma_wants_writenotify():
> 
> 	/* If it was private or non-writable, the write bit is already clear */
> 	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> 		return 0;
> 
> IOW private mappings are not optimized in current tree yet.
> 
> Peter Collingbourne proposed a patch some time ago to optimize it but it
> didn't get merged somehow.  Meanwhile even with his latest version it
> should still miss the thp case, so if to reference the private optimization
> Andrea's tree would be the best:
> 
> https://github.com/aagit/aa/commit/fadb5e04d94472614c76819acd979b2f60e4eff6
> 
> Hope it clarifies things a bit.  Thanks,

Thanks for the clarification. That’s what I suspected - I did not encounter
it since I only used private anonymous mappings. I will try to create a
test-case and send an additional fix for this issue.

Regards,
Nadav



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux