On Mon, Mar 03, 2025 at 03:01:38PM -0500, Mathieu Desnoyers wrote: > On 2025-02-28 17:32, Peter Xu wrote: > > On Fri, Feb 28, 2025 at 12:53:02PM -0500, Mathieu Desnoyers wrote: > > > On 2025-02-28 11:32, Peter Xu wrote: > > > > On Fri, Feb 28, 2025 at 09:59:00AM -0500, Mathieu Desnoyers wrote: > > > > > For the VM use-case, I wonder if we could just add a userfaultfd > > > > > "COW" event that would notify userspace when a COW happens ? > > > > > > > > I don't know what's the best for KSM and how well this will work, but we > > > > have such event for years.. See UFFDIO_REGISTER_MODE_WP: > > > > > > > > https://man7.org/linux/man-pages/man2/userfaultfd.2.html > > > > > > userfaultfd UFFDIO_REGISTER only seems to work if I pass an address > > > resulting from a mmap mapping, but returns EINVAL if I pass a > > > page-aligned address which sits within a private file mapping > > > (e.g. executable data). > > > > Yes, so far sync traps only supports RAM-based file systems, or anonymous. > > Generic private file mappings (that stores executables and libraries) are > > not yet supported. > > > > > > > > Also, I notice that do_wp_page() only calls handle_userfault > > > VM_UFFD_WP when vm_fault flags does not have FAULT_FLAG_UNSHARE > > > set. > > > > AFAICT that's expected, unshare should only be set on reads, never writes. > > So uffd-wp shouldn't trap any of those. > > > > > > > > AFAIU, as it stands now userfaultfd would not help tracking COW faults > > > caused by stores to private file mappings. Am I missing something ? > > > > I think you're right. So we have UFFD_FEATURE_WP_ASYNC that should work on > > most mappings. That one is async, though, so more like soft-dirty. It > > might be doable to try making it sync too without a lot of changes based on > > how async tracking works. > > I'm looking more closely at admin-guide/mm/pagemap.rst and it appears to > be a good fit. Here is what I have in mind to replace the ksmd scanning > thread for the VM use-case by a purely user-space driven scanning: > > Within qemu or similar user-space process: > > 1) Track guest memory with the userfaultfd UFFD_FEATURE_WP_ASYNC feature and > UFFDIO_REGISTER_MODE_WP mode. > > 2) Protect user-space memory with the PAGEMAP_SCAN ioctl PM_SCAN_WP_MATCHING flag > to detect memory which stays invariant for a long time. > > 3) Use the PAGEMAP_SCAN ioctl with PAGE_IS_WRITTEN to detect which pages are written to. > Keep track of memory which is frequently modified, so it can be left alone and > not write-protected nor merged anymore. > > 4) Whenever pages stay invariant for a given lapse of time, merge them with the new > madvise(2) KSM_MERGE behavior. > > Let me know if that makes sense. I can't speak of how KSM should go from there, but from userfault tracking POV, that makes sense to me. Thanks, -- Peter Xu