On Friday, July 19, 2024 1:09 AM, James Houghton wrote: > On Wed, Jul 17, 2024 at 8:34 AM Wang, Wei W <wei.w.wang@xxxxxxxxx> > wrote: > > > > On Thursday, July 11, 2024 7:42 AM, James Houghton wrote: > > > The first prong for enabling KVM Userfault support for x86 is to be > > > able to inform userspace of userfaults. We know when userfaults > > > occurs when > > > fault->pfn comes back as KVM_PFN_ERR_FAULT, so in > > > kvm_mmu_prepare_memory_fault_exit(), simply check if fault->pfn is > > > indeed KVM_PFN_ERR_FAULT. This means always setting fault->pfn to a > > > known value (I have chosen KVM_PFN_ERR_FAULT) before calling > > > kvm_mmu_prepare_memory_fault_exit(). > > > > > > The next prong is to unmap pages that are newly userfault-enabled. > > > Do this in kvm_arch_pre_set_memory_attributes(). > > > > Why is there a need to unmap it? > > I think a userfault is triggered on a page during postcopy when its > > data has not yet been fetched from the source, that is, the page is > > never accessed by guest on the destination and the page table leaf entry is > empty. > > > > You're right that it's not strictly necessary for implementing post-copy. This just > comes down to the UAPI we want: does ATTRIBUTE_USERFAULT mean "KVM > will be unable to access this memory; any attempt to access it will generate a > userfault" or does it mean "accesses to never-accessed, non-prefaulted > memory will generate a userfault." > > I think the former (i.e., the one implemented in this RFC) is slightly clearer and > slightly more useful. > > Userfaultfd does the latter: > 1. MAP_PRIVATE|MAP_ANONYMOUS + UFFDIO_REGISTER_MODE_MISSING: if > nothing is mapped (i.e., major page fault) 2. non-anonymous VMA + > UFFDIO_REGISTER_MODE_MISSING: if the page cache does not contain a page > 3. MAP_SHARED + UFFDIO_REGISTER_MODE_MINOR: if the page cache > *contains* a page, but we got a fault anyway > Yeah, you pointed a good reference here. I think it's beneficial to have different "modes" for a feature, so we don’t need to force everybody to carry on the unnecessary burden (e.g. unmap()). The design is targeted for postcopy support currently, we could start with what postcopy needs (similar to UFFDIO_REGISTER_MODE_MISSING), while leaving space for incremental addition of new modes for new usages. When there is a clear need for unmap(), it could be added under a new flag. > But in all of these cases, we have a way to start getting userfaults for already- > accessed memory: for (1) and (3), MADV_DONTNEED, and for (2), > fallocate(FALLOC_FL_PUNCH_HOLE). Those cases don't seem related to postcopy (i.e., faults are not brand new and they need to be handled locally). They could be added under a new flag later when the related usage is clear. > > Even if we didn't have MADV_DONTNEED (as used to be the case with > HugeTLB), we can use PROT_NONE to prevent anyone from mapping anything > in between an mmap() and a UFFDIO_REGISTER. This has been useful for me. Same for this case as well, plus gmem doesn't support mmap() at present. > > With KVM, we have neither of these tools (unless we include them here), > AFAIA.