Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 14, 2022 at 09:18:43AM -0700, Nadav Amit wrote:
> On Jun 14, 2022, at 8:22 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
> 
> > On 13.06.22 22:40, Nadav Amit wrote:
> >> From: Nadav Amit <namit@xxxxxxxxxx>
> >> 
> >> As we know, using a PTE on x86 with cleared access-bit (aka young-bit)
> >> takes ~600 cycles more than when the access-bit is set. At the same
> >> time, setting the access-bit for memory that is not used (e.g.,
> >> prefetched) can introduce greater overheads, as the prefetched memory is
> >> reclaimed later than it should be.
> >> 
> >> Userfaultfd currently does not set the access-bit (excluding the
> >> huge-pages case). Arguably, it is best to let the uffd monitor control
> >> whether the access-bit should be set or not. The expected use is for the
> >> monitor to request userfaultfd to set the access-bit when the copy
> >> operation is done to resolve a page-fault, and not to set the young-bit
> >> when the memory is prefetched.
> > 
> > Thinking out loud about existing users: postcopy live migration in QEMU
> > has two usage for placement of pages
> > 
> > a) Resolving a fault. E.g., a VCPU might be waiting for resolution to
> > make progress.
> > b) Background migration to converge without faults on all relevant
> > pages.
> > 
> > I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it.
> > 
> > 
> > I wonder, however, instead of calling this "young", which implies what
> > the OS should or shouldn't do, to define this as a hint that the placed
> > page is very likely to be accessed next.
> > 
> > I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I
> > have in mind.
> 
> How about UFFDIO_COPY_MODE_WILLNEED_READ ?
> 
> > 
> >> Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the
> >> young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit
> >> unconditionally since the former is only used to resolve page-faults and
> >> the latter would not benefit from not setting the access-bit.
> >> 
> >> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> >> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> >> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> >> Cc: Peter Xu <peterx@xxxxxxxxxx>
> >> Cc: David Hildenbrand <david@xxxxxxxxxx>
> >> Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx>
> >> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
> >> 
> >> ---
> >> 
> >> There are 2 possible enhancements:
> >> 
> >> 1. Use the flag to decide on whether to mark the PTE as dirty (for
> >> writable PTEs). I guess that setting the dirty-bit is as expensive as
> >> setting the access-bit, and setting it introduces similar tradeoffs,
> >> as mentioned above.
> >> 
> >> 2. Introduce a similar mode for write-protect and use this information
> >> for setting both the young and dirty bits. Makes one wonder whether
> >> mprotect() should also set the bit in certain cases...
> > 
> > I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs.
> > UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could.
> > 
> > For example, QEMU knows if a page fault it's resolving was due to a read
> > or a write fault and could use that information accordingly. Of course,
> > we don't completely know if we currently have a read fault, if we could
> > get a write fault immediately after.
> > 
> > Especially in the context of UFFDIO_ZEROPAGE,
> > UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but
> > instead populate an actual page and mark it accessed+dirty. I even have
> > a use case for that ;)
> > 
> > 
> > The kernel could decide how to treat these hints -- for example, if it
> > doesn't want user space to mess with access/dirty bits, it could just
> > mostly ignore the hints.
> 
> I can do that. I think users can do the zero page-copy themselves today, but
> whatever you prefer.
> 
> But, I cannot take it anymore: the list of arguments for uffd stuff is
> crazy. I would like to collect all the possible arguments that are used for
> uffd operation into some “struct uffd_op”.

Squashing boolean parameters into int flags will also reduce the insane
amount of parameters. No strong feelings though.
 
> Any objection?
> 
> 

-- 
Sincerely yours,
Mike.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux