On Thu, Jan 30, 2025 at 11:28:11AM -0500, Peter Xu wrote: > On Sun, Jan 26, 2025 at 11:34:29AM +0800, Xu Yilun wrote: > > > Definitely not suggesting to install an invalid pointer anywhere. The > > > mapped pointer will still be valid for gmem for example, but the fault > > > isn't. We need to differenciate two things (1) virtual address mapping, > > > then (2) permission and accesses on the folios / pages of the mapping. > > > Here I think it's okay if the host pointer is correctly mapped. > > > > > > For your private MMIO use case, my question is if there's no host pointer > > > to be mapped anyway, then what's the benefit to make the MR to be ram=on? > > > Can we simply make it a normal IO memory region? The only benefit of a > > > > The guest access to normal IO memory region would be emulated by QEMU, > > while private assigned MMIO requires guest direct access via Secure EPT. > > > > Seems the existing code doesn't support guest direct access if > > mr->ram == false: > > Ah it's about this, ok. > > I am not sure what's the best approach, but IMHO it's still better we stick > with host pointer always available when ram=on. OTOH, VFIO private regions > may be able to provide a special mark somewhere, just like when romd_mode > was done previously as below (qemu 235e8982ad39), so that KVM should still > apply these MRs even if they're not RAM. Also good to me. > > > > > static void kvm_set_phys_mem(KVMMemoryListener *kml, > > MemoryRegionSection *section, bool add) > > { > > [...] > > > > if (!memory_region_is_ram(mr)) { > > if (writable || !kvm_readonly_mem_allowed) { > > return; > > } else if (!mr->romd_mode) { > > /* If the memory device is not in romd_mode, then we actually want > > * to remove the kvm memory slot so all accesses will trap. */ > > add = false; > > } > > } > > > > [...] > > > > /* register the new slot */ > > do { > > > > [...] > > > > err = kvm_set_user_memory_region(kml, mem, true); > > } > > } > > > > > ram=on MR is, IMHO, being able to be accessed as RAM-like. If there's no > > > host pointer at all, I don't yet understand how that helps private MMIO > > > from working. > > > > I expect private MMIO not accessible from host, but accessible from > > guest so has kvm_userspace_memory_region2 set. That means the resolving > > of its PFN during EPT fault cannot depends on host pointer. > > > > https://lore.kernel.org/all/20250107142719.179636-1-yilun.xu@xxxxxxxxxxxxxxx/ > > I'll leave this to KVM experts, but I actually didn't follow exactly on why > mmu notifier is an issue to make , as I thought that was per-mm anyway, and KVM > should logically be able to skip all VFIO private MMIO regions if affected. I think this creates logical inconsistency. You builds the private MMIO EPT mapping on fault based on the HVA<->HPA mapping, but doesn't follow the HVA<->HPA mapping change. Why KVM believes the mapping on fault time but doesn't on mmu notify time? > This is a comment to this part of your commit message: > > Rely on userspace mapping also means private MMIO mapping should > follow userspace mapping change via mmu_notifier. This conflicts > with the current design that mmu_notifier never impacts private > mapping. It also makes no sense to support mmu_notifier just for > private MMIO, private MMIO mapping should be fixed when CoCo-VM > accepts the private MMIO, any following mapping change without > guest permission should be invalid. > > So I don't yet see a hard-no of reusing userspace mapping even if they're > not faultable as of now - what if they can be faultable in the future? I The first commit of guest_memfd emphasize a lot on the benifit of decoupling KVM mapping from host mapping. My understanding is even if guest memfd can be faultable later, KVM should still work in a way without userspace mapping. > am not sure.. > > OTOH, I also don't think we need KVM_SET_USER_MEMORY_REGION3 anyway.. The > _REGION2 API is already smart enough to leave some reserved fields: > > /* for KVM_SET_USER_MEMORY_REGION2 */ > struct kvm_userspace_memory_region2 { > __u32 slot; > __u32 flags; > __u64 guest_phys_addr; > __u64 memory_size; > __u64 userspace_addr; > __u64 guest_memfd_offset; > __u32 guest_memfd; > __u32 pad1; > __u64 pad2[14]; > }; > > I think we _could_ reuse some pad*? Reusing guest_memfd field sounds error > prone to me. It truly is. I'm expecting some suggestions here. Thanks, Yilun > > Not sure it could be easier if it's not guest_memfd* but fd + fd_offset > since the start. But I guess when introducing _REGION2 we didn't expect > MMIO private regions come so soon.. > > Thanks, > > -- > Peter Xu >