On Fri, Jan 07, 2022, Sean Christopherson wrote: > On Fri, Jan 07, 2022, David Stevens wrote: > > > > These are the type of pages which KVM is currently rejecting. Is this > > > > something that KVM can support? > > > > > > I'm not opposed to it. My complaint is that this series is incomplete in that it > > > allows mapping the memory into the guest, but doesn't support accessing the memory > > > from KVM itself. That means for things to work properly, KVM is relying on the > > > guest to use the memory in a limited capacity, e.g. isn't using the memory as > > > general purpose RAM. That's not problematic for your use case, because presumably > > > the memory is used only by the vGPU, but as is KVM can't enforce that behavior in > > > any way. > > > > > > The really gross part is that failures are not strictly punted to userspace; > > > the resulting error varies significantly depending on how the guest "illegally" > > > uses the memory. > > > > > > My first choice would be to get the amdgpu driver "fixed", but that's likely an > > > unreasonable request since it sounds like the non-KVM behavior is working as intended. > > > > > > One thought would be to require userspace to opt-in to mapping this type of memory > > > by introducing a new memslot flag that explicitly states that the memslot cannot > > > be accessed directly by KVM, i.e. can only be mapped into the guest. That way, > > > KVM has an explicit ABI with respect to how it handles this type of memory, even > > > though the semantics of exactly what will happen if userspace/guest violates the > > > ABI are not well-defined. And internally, KVM would also have a clear touchpoint > > > where it deliberately allows mapping such memslots, as opposed to the more implicit > > > behavior of bypassing ensure_pfn_ref(). > > > > Is it well defined when KVM needs to directly access a memslot? > > Not really, there's certainly no established rule. > > > At least for x86, it looks like most of the use cases are related to nested > > virtualization, except for the call in emulator_cmpxchg_emulated. > > The emulator_cmpxchg_emulated() will hopefully go away in the nearish future[*]. Forgot the link... https://lore.kernel.org/all/YcG32Ytj0zUAW%2FB2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > Paravirt features that communicate between guest and host via memory is the other > case that often maps a pfn into KVM. > > > Without being able to specifically state what should be avoided, a flag like > > that would be difficult for userspace to use. > > Yeah :-( I was thinking KVM could state the flag would be safe to use if and only > if userspace could guarantee that the guest would use the memory for some "special" > use case, but hadn't actually thought about how to word things. > > The best thing to do is probably to wait for for kvm_vcpu_map() to be eliminated, > as described in the changelogs for commits: > > 357a18ad230f ("KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache") > 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status") > > Once that is done, everything in KVM will either access guest memory through the > userspace hva, or via a mechanism that is tied into the mmu_notifier, at which > point accessing non-refcounted struct pages is safe and just needs to worry about > not corrupting _refcount.