Re: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases

Xiaoyao Li <xiaoyao.li@xxxxxxxxx> · Thu, 28 Mar 2024 21:21:37 +0800

On 3/28/2024 6:17 PM, Chao Gao wrote:
On Thu, Mar 28, 2024 at 11:40:27AM +0800, Xiaoyao Li wrote:
On 3/28/2024 11:04 AM, Edgecombe, Rick P wrote:
On Thu, 2024-03-28 at 09:30 +0800, Xiaoyao Li wrote:
The current ABI of KVM_EXIT_X86_RDMSR when TDs are created is nothing. So I don't see how this
is
any kind of ABI break. If you agree we shouldn't try to support MTRRs, do you have a different
exit
reason or behavior in mind?

Just return error on TDVMCALL of RDMSR/WRMSR on TD's access of MTRR MSRs.

MTRR appears to be configured to be type "Fixed" in the TDX module. So the guest could expect to be
able to use it and be surprised by a #GP.

          {
            "MSB": "12",
            "LSB": "12",
            "Field Size": "1",
            "Field Name": "MTRR",
            "Configuration Details": null,
            "Bit or Field Virtualization Type": "Fixed",
            "Virtualization Details": "0x1"
          },

If KVM does not support MTRRs in TDX, then it has to return the error somewhere or pretend to
support it (do nothing but not return an error). Returning an error to the guest would be making up
arch behavior, and to a lesser degree so would ignoring the WRMSR.

The root cause is that it's a bad design of TDX to make MTRR fixed1. When
guest reads MTRR CPUID as 1 while getting #VE on MTRR MSRs, it already breaks
the architectural behavior. (MAC faces the similar issue , MCA is fixed1 as

I won't say #VE on MTRR MSRs breaks anything. Writes to other MSRs (e.g.
TSC_DEADLINE MSR) also lead to #VE. If KVM can emulate the MSR accesses, #VE
should be fine.

The problem is: MTRR CPUID feature is fixed 1 while KVM/QEMU doesn't know how
to virtualize MTRR especially given that KVM cannot control the memory type in
secure-EPT entries.

yes, I partly agree on that "#VE on MTRR MSRs breaks anything". #VE is 
not a problem, the problem is if the #VE is opt-in or unconditional.

For the TSC_DEADLINE_MSR, #VE is opt-in actually. 
CPUID(1).EXC[24].TSC_DEADLINE is configurable by VMM. Only when VMM 
configures the bit to 1, will the TD guest get #VE. If VMM configures it 
to 0, TD guest just gets #GP. This is the reasonable design.

well while accessing MCA related MSRs gets #VE. This is why TDX is going to
fix them by introducing new feature and make them configurable)

So that is why I lean towards
returning to userspace and giving the VMM the option to ignore it, return an error to the guest or
show an error to the user.

"show an error to the user" doesn't help at all. Because user cannot fix it,
nor does QEMU.

The key point isn't who can fix/emulate MTRR MSRs. It is just KVM doesn't know
how to handle this situation and ask userspace for help.

Whether or how userspace can handle the MSR writes isn't KVM's problem. It may be
better if KVM can tell userspace exactly in which cases KVM will exit to
userspace. But there is no such an infrastructure.

An example is: in KVM CET series, we find it is complex for KVM instruction
emulator to emulate control flow instructions when CET is enabled. The
suggestion is also to punt to userspace (w/o any indication to userspace that
KVM would do this).

Please point me to decision of CET? I'm interested in how userspace can 
help on that.

If KVM can't support the behavior, better to get an actual error in
userspace than a mysterious guest hang, right?
What behavior do you mean?

Outside of what kind of exit it is, do you object to the general plan to punt to userspace?

Since this is a TDX specific limitation, I guess there is KVM_EXIT_TDX_VMCALL as a general category
of TDVMCALLs that cannot be handled by KVM.

Using KVM_EXIT_TDX_VMCALL looks fine.

We need to explain why MTRR MSRs are handled in this way unlike other MSRs.

It is better if KVM can tell userspace that MTRR virtualization isn't supported
by KVM for TDs. Then userspace should resolve the conflict between KVM and TDX
module on MTRR. But to report MTRR as unsupported, we need to make
GET_SUPPORTED_CPUID a vm-scope ioctl. I am not sure if it is worth the effort.

My memory is that Sean dislike the vm-scope GET_SUPPORTED_CPUID for TDX 
when he was at Intel.

Anyway, we can provide TDX specific interface to report SUPPORTED_CPUID 
in KVM_TDX_CAPABILITIES, if we really need it.

I just don't see any difference between handling it in KVM and handling it in
userspace: either a) return error to guest or b) ignore the WRMSR.