Re: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases

Xiaoyao Li <xiaoyao.li@xxxxxxxxx> · Thu, 28 Mar 2024 08:58:23 +0800

On 3/28/2024 8:45 AM, Edgecombe, Rick P wrote:
On Thu, 2024-03-28 at 08:06 +0800, Xiaoyao Li wrote:

TDX spec states that

    18.2.1.4.1 Memory Type for Private and Opaque Access

    The memory type for private and opaque access semantics, which use a
    private HKID, is WB.

    18.2.1.4.2 Memory Type for Shared Accesses

    Intel SDM, Vol. 3, 28.2.7.2 Memory Type Used for Translated Guest-
    Physical Addresses

    The memory type for shared access semantics, which use a shared HKID,
    is determined as described below. Note that this is different from the
    way memory type is determined by the hardware during non-root mode
    operation. Rather, it is a best-effort approximation that is designed
    to still allow the host VMM some control over memory type.
      • For shared access during host-side (SEAMCALL) flows, the memory
        type is determined by MTRRs.
      • For shared access during guest-side flows (VM exit from the guest
        TD), the memory type is determined by a combination of the Shared
        EPT and MTRRs.
        o If the memory type determined during Shared EPT walk is WB, then
          the effective memory type for the access is determined by MTRRs.
        o Else, the effective memory type for the access is UC.

My understanding is that guest MTRR doesn't affect the memory type for
private memory. So we don't need to zap private memory mappings.

Right, KVM can't zap the private side.

But why does KVM have to support a "best effort" MTRR virtualization for TDs? Kai pointed me to this
today and I haven't looked through it in depth yet:
https://lore.kernel.org/kvm/20240309010929.1403984-1-seanjc@xxxxxxxxxx/

An alternative could be to mirror that behavior, but normal VMs have to work with existing userspace
setup. KVM doesn't support any TDs yet, so we can take the opportunity to not introduce weird
things.

Not to provide any MTRR support for TD is what I prefer.

But guests won't accept memory again because no one
currently requests guests to do this after writes to MTRR MSRs. In this case,
guests may access unaccepted memory, causing infinite EPT violation loop
(assume SEPT_VE_DISABLE is set). This won't impact other guests/workloads on
the host. But I think it would be better if we can avoid wasting CPU resource
on the useless EPT violation loop.

Qemu is expected to do it correctly.  There are manyways for userspace to go
wrong.  This isn't specific to MTRR MSR.

This seems incorrect. KVM shouldn't force userspace to filter some
specific MSRs. The semantic of MSR filter is userspace configures it on
its own will, not KVM requires to do so.

I'm ok just always doing the exit to userspace on attempt to use MTRRs in a TD, and not rely on
the
MSR list. At least I don't see the problem.

What is the exit reason in vcpu->run->exit_reason?
KVM_EXIT_X86_RDMSR/WRMSR? If so, it breaks the ABI on
KVM_EXIT_X86_RDMSR/WRMSR.

How so? Userspace needs to learn to create a TD first.

The current ABI of KVM_EXIT_X86_RDMSR/WRMSR is that userspace itself 
sets up MSR fitler at first, then it will get such EXIT_REASON when 
guest accesses the MSRs being filtered.

If you want to use this EXIT reason, then you need to enforce userspace 
setting up the MSR filter. How to enforce? If not enforce, but exit with 
KVM_EXIT_X86_RDMSR/WRMSR no matter usersapce sets up MSR filter or not. 
Then you are trying to introduce divergent behavior in KVM.