Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Wed, 5 Feb 2025 03:51:00 +0000

+Min, can you comment?

3a3b12cbda ("UefiCpuPkg/MtrrLib: MtrrLibIsMtrrSupported always return FALSE in
TD-Guest") turned out to be problematic in practice.

Full thread:
https://lore.kernel.org/kvm/20250201005048.657470-1-seanjc@xxxxxxxxxx/

On Mon, 2025-02-03 at 16:27 -0800, Sean Christopherson wrote:
> On Mon, Feb 03, 2025, Rick P Edgecombe wrote:
> > On Mon, 2025-02-03 at 12:33 -0800, Sean Christopherson wrote:
> > > > Since there is no upstream KVM TDX support yet, why isn't it an option to
> > > > still
> > > > revert the EDKII commit too? It was a relatively recent change.
> > > 
> > > I'm fine with that route too, but it too is a band-aid.  Relying on the
> > > *untrusted*
> > > hypervisor to essentially communicate memory maps is not a winning strategy. 
> > > 
> > > > To me it seems that the normal KVM MTRR support is not ideal, because it is
> > > > still lying about what it is doing. For example, in the past there was an
> > > > attempt to use UC to prevent speculative execution accesses to sensitive
> > > > data.
> > > > The KVM MTRR support only happens to work with existing guests, but not all
> > > > possible MTRR usages.
> > > > 
> > > > Since diverging from the architecture creates loose ends like that, we could
> > > > instead define some other way for EDKII to communicate the ranges to the
> > > > kernel.
> > > > Like some simple KVM PV MSRs that are for communication only, and not
> > > 
> > > Hard "no" to any PV solution.  This isn't KVM specific, and as above, bouncing
> > > through the hypervisor to communicate information within the guest is asinine,
> > > especially for CoCo VMs.
> > 
> > Hmm, right.
> > 
> > So the other options could be:
> > 
> > 1. Some TDX module feature to hold the ranges:
> >  - Con: Not shared with AMD
> > 
> > 2. Re-use MTRRs for the communication, revert changes in guest and edk2:
> 
> Thinking more about how EDK2 is consumed downstream, I think reverting the EDK2
> changes is necessary regardless of what happens in the kernel.  Or at the least,
> somehow communicate to EDK2 users that ingesting those changes is a bad idea
> unless the kernel has also been updated.
> 
> AFAIK, Bring Your Own Firmware[*] isn't widely adopted, which means that the CSP
> is shipping the firmware.  And shipping OVMF/EDK2 with the "ignores MTRRs" code
> will cause problems for guests without commit 8e690b817e38 ("x86/kvm: Override
> default caching mode for SEV-SNP and TDX").  Since the host doesn't control the
> guest kernel, there's no way to know if deploying those EDK2 changes is safe.
>  
> [*] https://kvm-forum.qemu.org/2024/BYOF_-_KVM_Forum_2024_iWTioIP.pdf
> 

Hmm. Since there is no upstream TDX KVM support, for it's part, I guess KVM
should still get a chance to define a cleaner solution (if there actually was a
cleaner solution). But yea, it would mean only components from after the
solution was settled could be used together for a fully working stack. And
it should probably be called out somehow. Maybe could be in the KVM TDX docs or
something.

Still seems like a thing to avoid if possible.

> >  - Con: Creating more half support, when it's technically not required
> >  - Con: Still bouncing through the hypervisor
> 
> I assume by "Re-use MTRRs for the communication" you also mean updating the guest
> to address the "everything is UC!" flaw, otherwise another con is:
> 
>    - Con: Doesn't address the performance issue with TDX guests "using" UC
>           memory by default (unless there's yet more enabled).

Hmm. This is quite the tangled corner.

> 
> Presumably that can be accomplished by simply skipping the CR0.CD toggling, and
> doing MTRR stuff as nonrmal?

I'll have to get back to you on this one. Kirill probably could give a better
answer, but likely will not be able to follow up on this thread until next week.

> 
> >  - Pro: Design and code is clear
> > 
> > 3. Create some new architectural definition, like a bit that means "MTRRs don't
> > actually work:
> >  - Con: Takes a long time, need to get agreement
> >  - Con: Still bouncing through the hypervisor
> 
> Not for KVM guests.  As I laid out in my bug report, it's safe to assume MTRRs
> don't actually affect the memory type when running under KVM.
> 
> FWIW, PAT doesn't "work" on most KVM Intel setups either, because of misguided
> KVM code that resulted in "Ignore Guest PAT" being set in all EPTEs for the
> overwhelming majority of guests.  That's not desirable long term because it
> prevents the guest from using WC (via PAT) in situations where doing so is needed
> for performance and/or correctness.
> 
> >  - Pro: More pure solution
> 
> MTRRs "not working" is a red herring.  The problem isn't that MTRRs don't work,
> it's that the kernel is (somewhat unknowingly) using MTRRs as a crutch to get the
> desired memtype for devices.  E.g. for emulated MMIO, MTRRs _can't_ be virtualized,
> because there's never a valid mapping, i.e. there is no physical memory and thus
> no memtype.  In other words, under KVM guests (and possibly other hypervisors),
> MTRRs end up being nothing more than a communication channel between guest firmware
> and the kernel.

Yea.

> 
> The gap for CoCo VMs is that using MTRRs is undesirable because they are controlled
> by the untrusted host.  But that's largely a future problem, unless someone has a
> clever way to fix the kernel mess.
> 
> 

Yea, I wondered about that too. I imagine the thinking was that since it is only
controlling shared memory, it can be untrusted.

And I guess the solution in this patchset is hypothetically a bit more locked
down in that respect.