Re: A question about how the KVM emulates the effect of guest MTRRs on AMD platforms

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 30 Oct 2023 21:52:07 +0000

On Mon, Oct 30, 2023, Yibo Huang wrote:
> Well, I agree with Sean’s opinion that SVM+NPT obviates the need to emulate
> guest MTRRs for real-world guest workloads.  However, from my own experience,
> I think KVM does emulate the effect of guest MTRRs on AMD platforms.
> 
> Here's the reason:
> 2 months ago, I was trying to attach a QEMU ivshmem device to my VMs running
> on Intel and AMD machines.  Since ivshmem is an emulated memory-backed
> device, it should be cacheable to get the best performance.
> Interestingly, I found that the memory region associated with ivshmem (PCIe
> BAR 2 region) was cacheable on Intel machines, but not cacheable on AMD
> machines.
> After some digging, I found that this was because of the guest MTRRs - on AMD
> machines, BIOS or guest OS (not sure who did this) set the memory region of
> ivshmem as non-cacheable in guest MTRRs (but cacheable in guest PAT). This
> was supported by the fact that ivhsmem became cacheable after removing the
> corresponding guest MTRRs (reg02) on AMD machines (using "echo -n disable=2 >
> /proc/mtrr”)
> Additionally, the reason why ivshmem was cacheable on Intel machines was that
> BIOS or guest OS didn’t set ivshmem as uncacheable in guest MTRRs on Intel
> machines (not sure why though).

What test(s) did you run to determine whether or not the memory was truly cacheable?
KVM emulates the MTRR MSRs themselves, e.g. the guest can read and write MTRRs,
and the guest will _think_ memory has a certain memtype, but that doesn't necessarily
have any impact on the memtype used by the CPU.

> Below is the output of “cat /proc/mtrr” on my VMs running on AMD machines. By
> removing reg02, ivshmem BAR 2 region became cacheable.
> 
> 
> So in my opinion, the above phenomenon suggests that KVM does honor guest
> MTRRs on AMD platforms.

Heh, this isn't opinion.  Unless you're running a very specific 10-year old kernel,
or a custom KVM build, KVM simply out doesn't propagate guest MTRRs into NPT.

And unless your setup also has non-coherent DMA attached to the device, KVM doesn't
honor guest MTRRs for EPT either (AFAICT, QEMU ivshmem doesn't require VFIO).

It's definitely possible that disabling a guest MTRR resulted in memory becoming
cacheable, but unless there's some very, very magical code hiding, it's not because
KVM actually fully virtualizes guest MTRRs on AMD.

E.g. before commit 9a3768191d95 ("KVM: x86/mmu: Zap SPTEs on MTRR update iff guest
MTRRs are honored"), which hasn't even made its way to Linus (or Paolo's) tree yet,
KVM unnecessarily zapped all NPT entries on MTRR changes.  Zapping NPT entries
could have cleared some weird TLB state, or perhaps even wiped out buggy KVM NPT
entries.

And on AMD, hardware virtualizes gCR0.CD, i.e. puts the caches into no-fill mode
when guest CR0.CD=1.  But Intel CPUs completely ignore guest CR0.CD, i.e. punt it
to software, and under QEMU, for all intents and purposes KVM never honors guest
CR0.CD for VMX.  It's seems highly quite unlikely that something in the guest left
CR0.CD=1, but it's possible.  And then the guest kernel's process of toggling
CR0.CD when doing MTRR updates would end up clearing CR0.CD and thus re-enable
caching.

> The thing was that I could not find any KVM code related to emulating guest
> MTRRs on AMD platforms, which was the reason why I decided to send the
> initial email asking about it.
> 
> I found this in the AMD64 Architecture Programmer’s Manual Volumes 1–5 (page
> 553): 
> 
> "Table 15-19 shows how guest and host PAT types are combined into an
> effective PAT type. When interpreting this table, recall (a) that guest and
> host PAT types are not combined when nested paging is disabled and (b) that
> the intent is for the VMM to use its PAT type to simulate guest MTRRs.”
> 
> Does this mean that AMD expects the VMM to emulate the effect of guest MTRRs
> by altering the host PAT types?

Yes.  Which is exactly what KVM did in commit 3c2e7f7de324 ("KVM: SVM: use NPT
page attributes"), which was a reverted a few months after it was introduced.

> I am not sure if I misunderstood something. But I can reproduce the example I
> mentioned above if you would like to look into it.

Yes, it would be helpful to confirm what's going on.