Re: A question about how the KVM emulates the effect of guest MTRRs on AMD platforms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 01, 2023, Yan Zhao wrote:
> On Tue, Oct 31, 2023 at 08:14:41AM -0700, Sean Christopherson wrote:
> > FWIW, I don't think that page aliasing with WC/UC actually causes machine checks.
> > What does result in #MC (assuming things haven't changed in the last few years)
> > is accessing MMIO using WB and other cacheable memtypes, e.g. map the host APIC
> > with WB and you should see #MCs.  I suspect this is what people encountered years
> > ago when KVM attempted to honored guest MTRRs at all times.  E.g. the "full" MTRR
> > virtualization patch that got reverted deliberately allowed the guest to control
> > the memtype for host MMIO.
> > 
> > The SDM makes aliasing sound super scary, but then has footnotes where it explicitly
> > requires the CPU to play nice with aliasing, e.g. if MTRRs are *not* UC but the
> > effective memtype is UC, then the CPU is *required* to snoop caches:
> >
> Yes, I tried below combinations, none of them can trigger #MC.
> - effective memory type for guest access is WC, and that for host access is UC
> - effective memory type for guest access is UC, and that for host access is WC
> - effective memory type for guest access is UC, and that for host access is WB
> 
> >   2. The UC attribute came from the page-table or page-directory entry and
> >      processors are required to check their caches because the data may be cached
> >      due to page aliasing, which is not recommended.
> > 
> > Lack of snooping can effectively cause data corruption and ordering issues, but
> > at least for WC/UC vs. WB I don't think there are actual #MC problems with aliasing.
> > 
> Even no #MC on guest RAM?
> E.g. what if guest effective memory type is UC/WC, and host effective memory type
> is WB?
> (I tried in my machines with guest PAT=WC + host PAT=WB, looks no #MC, but I'm not sure
> if anything I'm missing and it's only in my specific environment.)
> 
> If no #MC, could EPT type of guest RAM also be set to WB (without IPAT) even
> without non-coherent DMA?

No, there are snooping/ordering issues on Intel, and to a lesser extent AMD.  AMD's
WC+ solves the most straightfoward cases, e.g. WC+ snoops caches, and VMRUN and
#VMEXIT flush the WC buffers to ensure that guest writes are visible and #VMEXIT
(and vice versa).  That may or may not be sufficient for multi-threaded use cases,
but I've no idea if there is actually anything to worry about on that front.  I
think there's also a flaw with guest using UC, which IIUC doesn't snoop caches,
i.e. the guest could get stale data.

AFAIK, Intel CPUs don't provide anything like WC+, so KVM would have to provide
something similar to safely let the guest control memtypes.  Arguably, KVM should
have such mechansisms anyways, e.g. to make non-coherent DMA VMs more robust.

But even then, there's still the question of why, i.e. what would be the benefit
of letting the guest control memtypes when it's not required for functional
correctness, and would that benefit outweight the cost.

> > > For CR0_CD=1,
> > > - w/o KVM_X86_QUIRK_CD_NW_CLEARED, it meets (b), but breaks (a).
> > > - w/  KVM_X86_QUIRK_CD_NW_CLEARED, with IPAT=1, it meets (a), but breaks (b);
> > >                                    with IPAT=0, it may breaks (a), but meets (b)
> > 
> > CR0.CD=1 is a mess above and beyond memtypes.  Huh.  It's even worse than I thought,
> > because according to the SDM, Atom CPUs don't support no-fill mode:
> > 
> >   3. Not supported In Intel Atom processors. If CD = 1 in an Intel Atom processor,
> >      caching is disabled.
> > 
> > Before I read that blurb about Atom CPUs, what I was going to say is that, AFAIK,
> > it's *impossible* to accurately virtualize CR0.CD=1 on VMX because there's no way
> > to emulate no-fill mode.
> > 
> > > > Discussion from the EPT+MTRR enabling thread[*] more or less confirms that Sheng
> > > > Yang was trying to resolve issues with passthrough MMIO.
> > > > 
> > > >  * Sheng Yang 
> > > >   : Do you mean host(qemu) would access this memory and if we set it to guest 
> > > >   : MTRR, host access would be broken? We would cover this in our shadow MTRR 
> > > >   : patch, for we encountered this in video ram when doing some experiment with 
> > > >   : VGA assignment. 
> > > > 
> > > > And in the same thread, there's also what appears to be confirmation of Intel
> > > > running into issues with Windows XP related to a guest device driver mapping
> > > > DMA with WC in the PAT.  Hilariously, Avi effectively said "KVM can't modify the
> > > > SPTE memtype to match the guest for EPT/NPT", which while true, completely overlooks
> > > > the fact that EPT and NPT both honor guest PAT by default.  /facepalm
> > > 
> > > My interpretation is that the since guest PATs are in guest page tables,
> > > while with EPT/NPT, guest page tables are not shadowed, it's not easy to
> > > check guest PATs  to disallow host QEMU access to non-WB guest RAM.
> > 
> > Ah, yeah, your interpretation makes sense.
> > 
> > The best idea I can think of to support things like this is to have KVM grab the
> > effective PAT memtype from the host userspace page tables, shove that into the
> > EPT/NPT memtype, and then ignore guest PAT.  I don't if that would actually work
> > though.
> Hmm, it might not work. E.g. in GPU, some MMIOs are mapped as UC-, while some
> others as WC, even they belong to the same BAR.
> I don't think host can know which one to choose in advance.
> I think it should be also true to RAM range, guest can do memremap to a memory
> type that host doesn't know beforehand.

The goal wouldn't be to honor guest memtype, it would be to ensure correctness.
E.g. guest can do memremap all it wants, and KVM will always ignore the guest's
memtype.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux