On Tue, Jun 28, 2022 at 03:43:00PM +1200, Kai Huang <kai.huang@xxxxxxxxx> wrote: > On Mon, 2022-06-27 at 14:52 -0700, isaku.yamahata@xxxxxxxxx wrote: > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > > TDX requires several initialization steps for KVM to create guest TDs. > > Detect CPU feature, enable VMX (TDX is based on VMX), detect TDX module > > availability, and initialize TDX module. This patch implements the first > > step to detect CPU feature. Because VMX isn't enabled yet by VMXON > > instruction on KVM kernel module initialization, defer further > > initialization step until VMX is enabled by hardware_enable callback. > > Not clear why you need to split into multiple patches. If we put all > initialization into one patch, it's much easier to see why those steps are done > in whatever way. I moved down this patch before "KVM: TDX: Initialize TDX module when loading kvm_intel.ko". So the patch flow is, - detect tdx cpu feature, and then - initialize tdx module. > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > new file mode 100644 > > index 000000000000..c12e61cdddea > > --- /dev/null > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -0,0 +1,40 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#include <linux/cpu.h> > > + > > +#include <asm/tdx.h> > > + > > +#include "capabilities.h" > > +#include "x86_ops.h" > > + > > +#undef pr_fmt > > +#define pr_fmt(fmt) "tdx: " fmt > > + > > +static u64 hkid_mask __ro_after_init; > > +static u8 hkid_start_pos __ro_after_init; > > + > > +int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops) > > +{ > > + u32 max_pa; > > + > > + if (!enable_ept) { > > + pr_warn("Cannot enable TDX with EPT disabled\n"); > > + return -EINVAL; > > + } > > + > > + if (!platform_tdx_enabled()) { > > + pr_warn("Cannot enable TDX on TDX disabled platform\n"); > > + return -ENODEV; > > + } > > + > > + /* Safe guard check because TDX overrides tlb_remote_flush callback. */ > > + if (WARN_ON_ONCE(x86_ops->tlb_remote_flush)) > > + return -EIO; > > To me it's better to move this chunk to the patch which actually implements how > to flush TLB foro private pages. W/o some background, it's hard to tell why TDX > needs to overrides tlb_remote_flush callback. Otherwise it's quite hard to > review here. > > For instance, even if it must be replaced, I am wondering why it must be empty > at the beginning? For instance, assuming it has an original version which does > something: > > x86_ops->tlb_remote_flush = vmx_remote_flush; > > Why cannot it be replaced with vt_tlb_remote_flush(): > > int vt_tlb_remote_flush(struct kvm *kvm) > { > if (is_td(kvm)) > return tdx_tlb_remote_flush(kvm); > > return vmx_remote_flush(kvm); > } > > ? There is a bit tricky part. Anyway I rewrote to follow this way. Here is a preparation patch to allow such direction. Subject: [PATCH 055/290] KVM: x86/VMX: introduce vmx tlb_remote_flush and tlb_remote_flush_with_range This is preparation for TDX to define its own tlb_remote_flush and tlb_remote_flush_with_range. Currently vmx code defines tlb_remote_flush and tlb_remote_flush_with_range defined as NULL by default and only when nested hyper-v guest case, they are defined to non-NULL methods. To make TDX code to override those two methods consistently with other methods, define vmx_tlb_remote_flush and vmx_tlb_remote_flush_with_range as nop and call hyper-v code only when nested hyper-v guest case. Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> --- arch/x86/kvm/kvm_onhyperv.c | 5 ++++- arch/x86/kvm/kvm_onhyperv.h | 1 + arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/svm/svm_onhyperv.h | 1 + arch/x86/kvm/vmx/main.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++++++++++----- arch/x86/kvm/vmx/x86_ops.h | 3 +++ 7 files changed, 41 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c index ee4f696a0782..d43518da1c0e 100644 --- a/arch/x86/kvm/kvm_onhyperv.c +++ b/arch/x86/kvm/kvm_onhyperv.c @@ -93,11 +93,14 @@ int hv_remote_flush_tlb(struct kvm *kvm) } EXPORT_SYMBOL_GPL(hv_remote_flush_tlb); +bool hv_use_remote_flush_tlb __ro_after_init; +EXPORT_SYMBOL_GPL(hv_use_remote_flush_tlb); + void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp) { struct kvm_arch *kvm_arch = &vcpu->kvm->arch; - if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) { + if (hv_use_remote_flush_tlb) { spin_lock(&kvm_arch->hv_root_tdp_lock); vcpu->arch.hv_root_tdp = root_tdp; if (root_tdp != kvm_arch->hv_root_tdp) diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h index 287e98ef9df3..9a07a34666fb 100644 --- a/arch/x86/kvm/kvm_onhyperv.h +++ b/arch/x86/kvm/kvm_onhyperv.h @@ -10,6 +10,7 @@ int hv_remote_flush_tlb_with_range(struct kvm *kvm, struct kvm_tlb_range *range); int hv_remote_flush_tlb(struct kvm *kvm); +extern bool hv_use_remote_flush_tlb __ro_after_init; void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp); #else /* !CONFIG_HYPERV */ static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index ef925722ee28..a11c78c8831b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -264,7 +264,7 @@ static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm, { int ret = -ENOTSUPP; - if (range && kvm_x86_ops.tlb_remote_flush_with_range) + if (range && kvm_available_flush_tlb_with_range()) ret = static_call(kvm_x86_tlb_remote_flush_with_range)(kvm, range); if (ret) diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h index e2fc59380465..b3cd61c62305 100644 --- a/arch/x86/kvm/svm/svm_onhyperv.h +++ b/arch/x86/kvm/svm/svm_onhyperv.h @@ -36,6 +36,7 @@ static inline void svm_hv_hardware_setup(void) svm_x86_ops.tlb_remote_flush = hv_remote_flush_tlb; svm_x86_ops.tlb_remote_flush_with_range = hv_remote_flush_tlb_with_range; + hv_use_remote_flush_tlb = true; } if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH) { diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 252b7298b230..e52e12b8d49a 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -187,6 +187,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .flush_tlb_all = vmx_flush_tlb_all, .flush_tlb_current = vmx_flush_tlb_current, + .tlb_remote_flush = vmx_tlb_remote_flush, + .tlb_remote_flush_with_range = vmx_tlb_remote_flush_with_range, .flush_tlb_gva = vmx_flush_tlb_gva, .flush_tlb_guest = vmx_flush_tlb_guest, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 5b8d399dd319..dc7ede3706e1 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3110,6 +3110,33 @@ void vmx_flush_tlb_current(struct kvm_vcpu *vcpu) vpid_sync_context(vmx_get_current_vpid(vcpu)); } +int vmx_tlb_remote_flush(struct kvm *kvm) +{ +#if IS_ENABLED(CONFIG_HYPERV) + if (hv_use_tlb_remote_flush) + return hv_remote_flush_tlb(kvm); +#endif + /* + * fallback to KVM_REQ_TLB_FLUSH. + * See kvm_arch_flush_remote_tlb() and kvm_flush_remote_tlbs(). + */ + return -EOPNOTSUPP; +} + +int vmx_tlb_remote_flush_with_range(struct kvm *kvm, + struct kvm_tlb_range *range) +{ +#if IS_ENABLED(CONFIG_HYPERV) + if (hv_use_tlb_remote_flush) + return hv_remote_flush_tlb_with_range(kvm, range); +#endif + /* + * fallback to tlb_remote_flush. See + * kvm_flush_remote_tlbs_with_range() + */ + return -EOPNOTSUPP; +} + void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr) { /* @@ -8176,11 +8203,8 @@ __init int vmx_hardware_setup(void) #if IS_ENABLED(CONFIG_HYPERV) if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH - && enable_ept) { - vt_x86_ops.tlb_remote_flush = hv_remote_flush_tlb; - vt_x86_ops.tlb_remote_flush_with_range = - hv_remote_flush_tlb_with_range; - } + && enable_ept) + hv_use_tlb_remote_flush = true; #endif if (!cpu_has_vmx_ple()) { diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index e70f84d29d21..5ecf99170b30 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -90,6 +90,9 @@ void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); bool vmx_get_if_flag(struct kvm_vcpu *vcpu); void vmx_flush_tlb_all(struct kvm_vcpu *vcpu); void vmx_flush_tlb_current(struct kvm_vcpu *vcpu); +int vmx_tlb_remote_flush(struct kvm *kvm); +int vmx_tlb_remote_flush_with_range(struct kvm *kvm, + struct kvm_tlb_range *range); void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr); void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu); void vmx_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask); -- 2.25.1 > > + > > + max_pa = cpuid_eax(0x80000008) & 0xff; > > + hkid_start_pos = boot_cpu_data.x86_phys_bits; > > + hkid_mask = GENMASK_ULL(max_pa - 1, hkid_start_pos); > > + pr_info("kvm: TDX is supported. hkid start pos %d mask 0x%llx\n", > > + hkid_start_pos, hkid_mask); > > Again, I think it's better to introduce those in the patch where you actually > need those. It will be more clear if you introduce those with the code which > actually uses them. > > For instance, I think both hkid_start_pos and hkid_mask are not necessary. If > you want to apply one keyid to an address, isn't below enough? > > u64 phys |= ((keyid) << boot_cpu_data.x86_phys_bits); Nice catch. I removed max_pa, hkid_start_pos and hkid_mask. > > diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h > > index 0f8a8547958f..0a5967a91e26 100644 > > --- a/arch/x86/kvm/vmx/x86_ops.h > > +++ b/arch/x86/kvm/vmx/x86_ops.h > > @@ -122,4 +122,10 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); > > #endif > > void vmx_setup_mce(struct kvm_vcpu *vcpu); > > > > +#ifdef CONFIG_INTEL_TDX_HOST > > +int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops); > > +#else > > +static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return 0; } > > +#endif > > I think if you introduce a "tdx_ops.h", or "tdx_x86_ops.h", and you can only > include it when CONFIG_INTEL_TDX_HOST is true, then in tdx_ops.h you don't need > those stubs. > > Makes sense? main.c includes many tdx_xxx(). If we do so without stubs, many CONFIG_INTEL_TDX_HOST in main.c. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>