On Tue, 2024-05-14 at 17:59 -0700, Rick Edgecombe wrote: > When virtualizing some CPU features, KVM uses kvm_zap_gfn_range() to zap > guest mappings so they can be faulted in with different PTE properties. > > For TDX private memory this technique is fundamentally not possible. > Remapping private memory requires the guest to "accept" it, and also the > needed PTE properties are not currently supported by TDX for private > memory. > > These CPU features are: > 1) MTRR update > 2) CR0.CD update > 3) Non-coherent DMA status update > 4) APICV update > > Since they cannot be supported, they should be blocked from being > exercised by a TD. In the case of CRO.CD, the feature is fundamentally not > supported for TDX as it cannot see the guest registers. For APICV > inhibit it in future changes. > > Guest MTRR support is more of an interesting case. Supported versions of > the TDX module fix the MTRR CPUID bit to 1, but as previously discussed, > it is not possible to fully support the feature. This leaves KVM with a > few options: > - Support a modified version of the architecture where the caching > attributes are ignored for private memory. > - Don't support MTRRs and treat the set MTRR CPUID bit as a TDX Module > bug. > > With the additional consideration that likely guest MTRR support in KVM > will be going away, the later option is the best. Prevent MTRR MSR writes > from calling kvm_zap_gfn_range() in future changes. > > Lastly, the most interesting case is non-coherent DMA status updates. > There isn't a way to reject the call. KVM is just notified that there is a > non-coherent DMA device attached, and expected to act accordingly. For > normal VMs today, that means to start respecting guest PAT. However, > recently there has been a proposal to avoid doing this on selfsnoop CPUs > (see link). On such CPUs it should not be problematic to simply always > configure the EPT to honor guest PAT. In future changes TDX can enforce > this behavior for shared memory, resulting in shared memory always > respecting guest PAT for TDX. So kvm_zap_gfn_range() will not need to be > called in this case either. > > Unfortunately, this will result in different cache attributes between > private and shared memory, as private memory is always WB and cannot be > changed by the VMM on current TDX modules. But it can't really be helped > while also supporting non-coherent DMA devices. > > Since all callers will be prevented from calling kvm_zap_gfn_range() in > future changes, report a bug and terminate the guest if other future > changes to KVM result in triggering kvm_zap_gfn_range() for a TD. > > For lack of a better method currently, use kvm_gfn_shared_mask() to > determine if private memory cannot be zapped (as in TDX, the only VM type > that sets it). > > Link: https://lore.kernel.org/all/20240309010929.1403984-6-seanjc@xxxxxxxxxx/ > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > --- > TDX MMU Part 1: > - Remove support from "KVM: x86/tdp_mmu: Zap leafs only for private memory" > - Add this KVM_BUG_ON() instead > --- > arch/x86/kvm/mmu/mmu.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index d5cf5b15a10e..808805b3478d 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -6528,8 +6528,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); > > - if (tdp_mmu_enabled) > + if (tdp_mmu_enabled) { > + /* > + * kvm_zap_gfn_range() is used when MTRR or PAT memory > + * type was changed. TDX can't handle zapping the private > + * mapping, but it's ok because KVM doesn't support either of > + * those features for TDX. In case a new caller appears, BUG > + * the VM if it's called for solutions with private aliases. > + */ > + KVM_BUG_ON(kvm_gfn_shared_mask(kvm), kvm); > flush = kvm_tdp_mmu_zap_leafs(kvm, gfn_start, gfn_end, flush); > + } > > if (flush) > kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start); kvm_zap_gfn_range() looks a generic function. I think it makes more sense to let the callers to explicitly check whether VM is TDX guest and do the KVM_BUG_ON()?