On 3/28/2024 8:36 AM, Isaku Yamahata wrote:
On Thu, Mar 28, 2024 at 08:06:53AM +0800,
Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote:
On 3/28/2024 1:36 AM, Edgecombe, Rick P wrote:
On Wed, 2024-03-27 at 10:54 +0800, Xiaoyao Li wrote:
If QEMU doesn't configure the msr filter list correctly, KVM has to handle
guest's MTRR MSR accesses. In my understanding, the
suggestion is KVM zap private memory mappings.
TDX spec states that
18.2.1.4.1 Memory Type for Private and Opaque Access
The memory type for private and opaque access semantics, which use a
private HKID, is WB.
18.2.1.4.2 Memory Type for Shared Accesses
Intel SDM, Vol. 3, 28.2.7.2 Memory Type Used for Translated Guest-
Physical Addresses
The memory type for shared access semantics, which use a shared HKID,
is determined as described below. Note that this is different from the
way memory type is determined by the hardware during non-root mode
operation. Rather, it is a best-effort approximation that is designed
to still allow the host VMM some control over memory type.
• For shared access during host-side (SEAMCALL) flows, the memory
type is determined by MTRRs.
• For shared access during guest-side flows (VM exit from the guest
TD), the memory type is determined by a combination of the Shared
EPT and MTRRs.
o If the memory type determined during Shared EPT walk is WB, then
the effective memory type for the access is determined by MTRRs.
o Else, the effective memory type for the access is UC.
My understanding is that guest MTRR doesn't affect the memory type for
private memory. So we don't need to zap private memory mappings.
So, there is no point to (try to) emulate MTRR. The direction is, don't
advertise MTRR to the guest (new TDX module is needed.) or enforce
the guest to not use MTRR (guest command line clearcpuid=mtrr).
Ideally, it would be better if TD guest learns to disable/not use MTRR
itself.
KVM will
simply return error to guest access to MTRR related registers.
QEMU or user space VMM can use the MSR filter if they want.
But guests won't accept memory again because no one
currently requests guests to do this after writes to MTRR MSRs. In this case,
guests may access unaccepted memory, causing infinite EPT violation loop
(assume SEPT_VE_DISABLE is set). This won't impact other guests/workloads on
the host. But I think it would be better if we can avoid wasting CPU resource
on the useless EPT violation loop.
Qemu is expected to do it correctly. There are manyways for userspace to go
wrong. This isn't specific to MTRR MSR.
This seems incorrect. KVM shouldn't force userspace to filter some
specific MSRs. The semantic of MSR filter is userspace configures it on
its own will, not KVM requires to do so.
I'm ok just always doing the exit to userspace on attempt to use MTRRs in a TD, and not rely on the
MSR list. At least I don't see the problem.
What is the exit reason in vcpu->run->exit_reason? KVM_EXIT_X86_RDMSR/WRMSR?
If so, it breaks the ABI on KVM_EXIT_X86_RDMSR/WRMSR.
It's only when the user space requested it with the MSR filter.
right. But userspace has no reason to filter them because userspace can
do nothing except 1) either kill the TD, or 2) eat the instruction.