On 13.06.22 22:40, Nadav Amit wrote: > From: Nadav Amit <namit@xxxxxxxxxx> > > As we know, using a PTE on x86 with cleared access-bit (aka young-bit) > takes ~600 cycles more than when the access-bit is set. At the same > time, setting the access-bit for memory that is not used (e.g., > prefetched) can introduce greater overheads, as the prefetched memory is > reclaimed later than it should be. > > Userfaultfd currently does not set the access-bit (excluding the > huge-pages case). Arguably, it is best to let the uffd monitor control > whether the access-bit should be set or not. The expected use is for the > monitor to request userfaultfd to set the access-bit when the copy > operation is done to resolve a page-fault, and not to set the young-bit > when the memory is prefetched. Thinking out loud about existing users: postcopy live migration in QEMU has two usage for placement of pages a) Resolving a fault. E.g., a VCPU might be waiting for resolution to make progress. b) Background migration to converge without faults on all relevant pages. I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it. I wonder, however, instead of calling this "young", which implies what the OS should or shouldn't do, to define this as a hint that the placed page is very likely to be accessed next. I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I have in mind. > > Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the > young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit > unconditionally since the former is only used to resolve page-faults and > the latter would not benefit from not setting the access-bit. > > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx> > Cc: Peter Xu <peterx@xxxxxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx> > Signed-off-by: Nadav Amit <namit@xxxxxxxxxx> > > --- > > There are 2 possible enhancements: > > 1. Use the flag to decide on whether to mark the PTE as dirty (for > writable PTEs). I guess that setting the dirty-bit is as expensive as > setting the access-bit, and setting it introduces similar tradeoffs, > as mentioned above. > > 2. Introduce a similar mode for write-protect and use this information > for setting both the young and dirty bits. Makes one wonder whether > mprotect() should also set the bit in certain cases... I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs. UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could. For example, QEMU knows if a page fault it's resolving was due to a read or a write fault and could use that information accordingly. Of course, we don't completely know if we currently have a read fault, if we could get a write fault immediately after. Especially in the context of UFFDIO_ZEROPAGE, UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but instead populate an actual page and mark it accessed+dirty. I even have a use case for that ;) The kernel could decide how to treat these hints -- for example, if it doesn't want user space to mess with access/dirty bits, it could just mostly ignore the hints. -- Thanks, David / dhildenb