On Jun 14, 2022, at 11:56 AM, Mike Rapoport <rppt@xxxxxxxxxxxxx> wrote: > On Tue, Jun 14, 2022 at 09:18:43AM -0700, Nadav Amit wrote: >> On Jun 14, 2022, at 8:22 AM, David Hildenbrand <david@xxxxxxxxxx> wrote: >> >>> On 13.06.22 22:40, Nadav Amit wrote: >>>> From: Nadav Amit <namit@xxxxxxxxxx> >>>> >>>> As we know, using a PTE on x86 with cleared access-bit (aka young-bit) >>>> takes ~600 cycles more than when the access-bit is set. At the same >>>> time, setting the access-bit for memory that is not used (e.g., >>>> prefetched) can introduce greater overheads, as the prefetched memory is >>>> reclaimed later than it should be. >>>> >>>> Userfaultfd currently does not set the access-bit (excluding the >>>> huge-pages case). Arguably, it is best to let the uffd monitor control >>>> whether the access-bit should be set or not. The expected use is for the >>>> monitor to request userfaultfd to set the access-bit when the copy >>>> operation is done to resolve a page-fault, and not to set the young-bit >>>> when the memory is prefetched. >>> >>> Thinking out loud about existing users: postcopy live migration in QEMU >>> has two usage for placement of pages >>> >>> a) Resolving a fault. E.g., a VCPU might be waiting for resolution to >>> make progress. >>> b) Background migration to converge without faults on all relevant >>> pages. >>> >>> I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it. >>> >>> >>> I wonder, however, instead of calling this "young", which implies what >>> the OS should or shouldn't do, to define this as a hint that the placed >>> page is very likely to be accessed next. >>> >>> I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I >>> have in mind. >> >> How about UFFDIO_COPY_MODE_WILLNEED_READ ? >> >>>> Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the >>>> young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit >>>> unconditionally since the former is only used to resolve page-faults and >>>> the latter would not benefit from not setting the access-bit. >>>> >>>> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >>>> Cc: Hugh Dickins <hughd@xxxxxxxxxx> >>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >>>> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> >>>> Cc: Peter Xu <peterx@xxxxxxxxxx> >>>> Cc: David Hildenbrand <david@xxxxxxxxxx> >>>> Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx> >>>> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx> >>>> >>>> --- >>>> >>>> There are 2 possible enhancements: >>>> >>>> 1. Use the flag to decide on whether to mark the PTE as dirty (for >>>> writable PTEs). I guess that setting the dirty-bit is as expensive as >>>> setting the access-bit, and setting it introduces similar tradeoffs, >>>> as mentioned above. >>>> >>>> 2. Introduce a similar mode for write-protect and use this information >>>> for setting both the young and dirty bits. Makes one wonder whether >>>> mprotect() should also set the bit in certain cases... >>> >>> I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs. >>> UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could. >>> >>> For example, QEMU knows if a page fault it's resolving was due to a read >>> or a write fault and could use that information accordingly. Of course, >>> we don't completely know if we currently have a read fault, if we could >>> get a write fault immediately after. >>> >>> Especially in the context of UFFDIO_ZEROPAGE, >>> UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but >>> instead populate an actual page and mark it accessed+dirty. I even have >>> a use case for that ;) >>> >>> >>> The kernel could decide how to treat these hints -- for example, if it >>> doesn't want user space to mess with access/dirty bits, it could just >>> mostly ignore the hints. >> >> I can do that. I think users can do the zero page-copy themselves today, but >> whatever you prefer. >> >> But, I cannot take it anymore: the list of arguments for uffd stuff is >> crazy. I would like to collect all the possible arguments that are used for >> uffd operation into some “struct uffd_op”. > > Squashing boolean parameters into int flags will also reduce the insane > amount of parameters. No strong feelings though. > >> Any objection? Thanks. I also noticed a couple of embarrassing bugs that I made. Will send v1 with fixes.