Re: A question about how the KVM emulates the effect of guest MTRRs on AMD platforms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 27, 2023 at 04:13:36PM -0700, Sean Christopherson wrote:
> +Yan
> 
> On Sat, Aug 12, 2023, Yibo Huang wrote:
> > Hi the KVM community,
> > 
> > I am sending this email to ask about how the KVM emulates the effect of guest
> > MTRRs on AMD platforms.
> > 
> > Since there is no hardware support for guest MTRRs, the VMM can simulate
> > their effect by altering the memory types in the EPT/NPT. From my
> > understanding, this is exactly what the KVM does for Intel platforms. More
> > specifically, in arch/x86/kvm/mmu/spte.c #make_spte(), the KVM tries to
> > respect the guest MTRRs by calling #kvm_x86_ops.get_mt_mask() to get the
> > memory types indicated by the guest MTRRs and applying that to the EPT. For
> > Intel platforms, the implementation of #kvm_x86_ops.get_mt_mask() is
> > #vmx_get_mt_mask(), which calls the #kvm_mtrr_get_guest_memory_type() to get
> > the memory types indicated by the guest MTRRs.
> 
> KVM doesn't always honor guest MTTRs, KVM only does all of this if there is a
> passhtrough device with non-coherent DMA attached to the VM.  There's actually
> an outstanding issue with virtio-gpu where non-coherent GPUs are flaky due to
> KVM not stuffing the EPT memtype because KVM isn't aware of the non-coherent DMA.
> 
> > However, on AMD platforms, the KVM does not implement
> > #kvm_x86_ops.get_mt_mask() at all, so it just returns zero. Does it mean that
> > the KVM does not use the NPT to emulate the effect of guest MTRRs on AMD
> > platforms? I tried but failed to find out how the KVM does for AMD platforms.
> 
> Correct.  The short answer is that SVM+NPT obviates the need to emulate guest
> MTRRs for real world guest workloads.
> 
> The shortcomings of VMX+EPT are that (a) guest CR0.CD isn't virtualized by
> hardware and (b) AFAIK, if the guest accesses memory with PAT=WC to memory that
> the host has accessed with PAT=WB (and MTRR=WB), the CPU will *not* snoop caches
> on the guest access.
> 
> SVM on the other hand fully virtualizes CR0.CD, and NPT is quite clever in how
> it handles guest WC:
> 
>   A new memory type WC+ is introduced. WC+ is an uncacheable memory type, and
>   combines writes in write-combining buffers like WC. Unlike WC (but like the CD
>   memory type), accesses to WC+ memory also snoop the caches on all processors
>   (including self-snooping the caches of the processor issuing the request) to
>   maintain coherency. This ensures that cacheable writes are observed by WC+ accesses.
> 
> And VMRUN (and #VMEXIT) flush the WC buffers, e.g. if the guest is using WB and
> the host is using WC, things will still work as expected (well, maybe not for
> cases where the host is writing and the guest is reading from different CPUs).
> Anyways, evidenced by the lack of bug reports over the last decade, for practical
> purposes snooping the caches on guest WC accesses is sufficient.
> 
> Hrm, but typing all that out, I have absolutely no idea why VMX+EPT cares about
> guest MTRRs.  Honoring guest PAT I totally get, but the guest MTRRs make no sense.
I think honoring guest MTRRs is because VMX+EPT relies on guest to send clflush or
wbinvd in cases like EPT is WC/UC + guest PAT is WB for non-coherent DMA devices.
So, in order to let guest driver's view of memory type and host effective memory
type be consistent, current KVM programs EPT with the value of guest MTRRs.

If EPT only honors guest PAT and sets EPT to WB, while guest MTRR is WC or UC,
then if guest driver thinks the effective memory type is WC or UC, it will not
do the cache flush correctly.

But I don't see linux guest driver to check combination of guest MTRR + guest PAT
directly.

Instead, when linux guest driver wants to program a PAT, it checks guest MTRRs to see
if it's feasible.

remap_pfn_range
  reserve_pfn_range
    memtype_reserve
      pat_x_mtrr_type

So, before guest programs PAT to WB, it should find guest MTRR is WC/UC and return
WC/UC as PAT or just fail.

In this regard, I think honoring guest PAT only also makes sense.

> E.g. I have a very hard time believing a real world guest kernel mucks with the
> MTRRs to setup DMA.  And again, this is supported by the absense of bug reports
> on AMD.
> 
> 
> Yan,
> 
> You've been digging into this code recently, am I forgetting something because
> it's late on a Friday?  Or have we been making the very bad assumption that KVM
> code from 10+ years ago actually makes sense?  I.e. for non-coherent DMA, can we
> delete all of the MTRR insanity and simply clear IPAT?
Not sure if there are guest drivers can program PAT as WB but treat memory type
as UC.
In theory, honoring guest MTRRs is the most safe way.
Do you think a complete analyse of all corner cases are deserved?
I'm happy if we can remove all the MTRR stuffs in VMX :)







[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux