Re: [PATCH] userfaultfd: mark uffd_wp regardless of VM_WRITE flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 17, 2022 at 08:00:12PM -0800, Nadav Amit wrote:
> 
> > On Feb 17, 2022, at 6:23 PM, Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
> > 
> >> PS: I always think here the VM_SOFTDIRTY check is wrong, IMHO it should be:
> >> 
> >>       if (dirty_accountable && pte_dirty(ptent) &&
> >>                       (pte_soft_dirty(ptent) ||
> >>                       (vma->vm_flags & VM_SOFTDIRTY))) {
> >>               ptent = pte_mkwrite(ptent);
> >>       }
> 
> I know it is off-topic (not directly related to my patch), but
> I tried to understand the logic - both of the existing code and of
> your suggested change - and I failed.
> 
> IIUC dirty_accountable (whose value is taken from
> vma_wants_writenotify()) means that the writes *should* be tracked,
> and therefore the page should remain read-only.

Right.

> 
> So basically the condition should have been based on
> !dirty_accountable, i.e. the inverted value of dirty_accountable.
> 
> The problem is that dirty_accountable also reflects VM_SOFTDIRTY
> considerations, so looking on the PTE does not tell you whether
> the PTE should remain write-protected even if it is dirty.

My understanding is that the dirty bits (especially if both set) means
we've tracked dirty on this pte already so we don't need to, hence we can
set the dirty bit here.  E.g., continuous mprotect(RO), mprotect(RW) upon a
full dirty pte.

When something wants to enable tracking again, it needs to clear the dirty
bit, either the real one or soft-dirty one.  So it's a pure performance
enhancement to conditionally set write bit here, when we're sure we won't
need any further tracking on this pte.

One thing to mention is that this path only applies to VM_SHARED|VM_WRITE,
because that's what checked the first in vma_wants_writenotify():

	/* If it was private or non-writable, the write bit is already clear */
	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
		return 0;

IOW private mappings are not optimized in current tree yet.

Peter Collingbourne proposed a patch some time ago to optimize it but it
didn't get merged somehow.  Meanwhile even with his latest version it
should still miss the thp case, so if to reference the private optimization
Andrea's tree would be the best:

https://github.com/aagit/aa/commit/fadb5e04d94472614c76819acd979b2f60e4eff6

Hope it clarifies things a bit.  Thanks,

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux