First of all, I think the patch title can be improved. "refactor CPU compatibility check on module initialization" isn't the purpose of this patch. It is just a bonus. The title should reflect the main purpose (or behaviour) of this patch: KVM: Temporarily enable hardware on all cpus during module loading time On Sun, 2022-08-07 at 15:00 -0700, isaku.yamahata@xxxxxxxxx wrote: > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > TDX module requires its initialization. It requires VMX to be enabled. > Although there are several options of when to initialize it, the choice is > the initialization time of the KVM kernel module. There is no usable > arch-specific hook for the TDX module to utilize during the KVM kernel module > initialization. The code doesn't enable/disable hardware (VMX in TDX case) > during the kernel module initialization. Add a hook for enabling hardware, > arch-specific initialization, and disabling hardware during KVM kernel > module initialization to make a room for TDX module initialization. > The hook is only for arch-specific initialization. It doesn't do hardware_enable_all() and hardware_disable_all(). > The > current KVM enables hardware when the first VM is created and disables > hardware when the last VM is destroyed. When no VM is running, hardware is > disabled. To follow these semantics, the kernel module initialization needs > to disable hardware. Opportunistically refactor the code to enable/disable > hardware. > > Add hadware_enable_all() and hardware_disable_all() to kvm_init() and > introduce a new arch-specific callback function, > kvm_arch_post_hardware_enable_setup, for arch to do arch-specific > initialization that requires hardware_enable_all(). Opportunistically, > move kvm_arch_check_processor_compat() to to hardware_enabled_nolock(). > TDX module initialization code will go into > kvm_arch_post_hardware_enable_setup(). The remaining paragraphs are very hard to follow. IMHO they are too details and are not changelog material. If you can obtain Reviewed-by from reviewer/maintainer for specific arch, then it implies the code is fine to that arch, and you don't need those details below in changelog. So perhaps just say something like: With doing hardware_enable_all() and hardware_disable_all() in kvm_init(), some arch-specific code which must be done before them must also be moved before kvm_init(). Only VMX and MIPS have such code: - For VMX, per-cpu variable loaded_vmcss_on_cpu must be initialized before hardware_disable_all(). Introduce vmx_early_init() to do it and call vmx_eraly_init() before kvm_init() in vmx_init(). - For MIPS, move kvm_init() to the end of kvm_mips_init() (or whatever you like...). > > This patch reorders some function calls as below from (*) (**) (A) and (B) > to (A) (B) and (*). Here (A) and (B) depends on (*), but not (**). By > code inspection, only mips and VMX has the code of (*). No other > arch has empty (*). So refactor mips and VMX and eliminate the > necessity hook for (*) instead of adding an unused hook. > > Before this patch: > - Arch module initialization > - kvm_init() > - kvm_arch_init() > - kvm_arch_check_processor_compat() on each CPUs > - post-arch-specific initialization -- (*): (A) and (B) depends on this > - post-arch-specific initialization -- (**): no dependency to (A) and (B) > > - When creating/deleting the first/last VM > - kvm_arch_hardware_enable() on each CPUs -- (A) > - kvm_arch_hardware_disable() on each CPUs -- (B) > > After this patch: > - Arch module initialization > - kvm_init() > - kvm_arch_init() > - arch-specific initialization -- (*) > - kvm_arch_check_processor_compat() on each CPUs > - kvm_arch_hardware_enable() on each CPUs -- (A) > - kvm_arch_hardware_disable() on each CPUs -- (B) > - post-arch-specific initialization -- (**) > > - When creating/deleting the first/last VM (no logic change) > - kvm_arch_hardware_enable() on each CPUs -- (A) > - kvm_arch_hardware_disable() on each CPUs -- (B) > > Code inspection result: > As long as I inspected, I found only mips and VMX have non-empty (*) or > non-empty (A) or (B). > x86: tested on a real machine > mips: compile test only > powerpc, s390, arm, riscv: code inspection only > > - arch/mips/kvm/mips.c > module init function, kvm_mips_init(), does some initialization after > kvm_init(). Compile test only. > > - arch/x86/kvm/x86.c > - uses vm_list which is statically initialized. > - static_call(kvm_x86_hardware_enable)(); > - SVM: (*) and (**) are empty. > - VMX: initialize percpu variable loaded_vmcss_on_cpu that VMXON uses. > > - arch/powerpc/kvm/powerpc.c > kvm_arch_hardware_enable/disable() are nop > > - arch/s390/kvm/kvm-s390.c > kvm_arch_hardware_enable/disable() are nop > > - arch/arm64/kvm/arm.c > module init function, arm_init(), calls only kvm_init(). > (*) and (**) are empty > > - arch/riscv/kvm/main.c > module init function, riscv_kvm_init(), calls only kvm_init(). > (*) and (**) are empty > > Co-developed-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > --- > arch/mips/kvm/mips.c | 16 +++++++++++----- > arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++---- > include/linux/kvm_host.h | 1 + > virt/kvm/kvm_main.c | 31 ++++++++++++++++++++++++------- > 4 files changed, 55 insertions(+), 16 deletions(-) > > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > index 092d09fb6a7e..fd7339cff57c 100644 > --- a/arch/mips/kvm/mips.c > +++ b/arch/mips/kvm/mips.c > @@ -1642,12 +1642,11 @@ static int __init kvm_mips_init(void) > return -EOPNOTSUPP; > } > > + /* > + * kvm_init() calls kvm_arch_hardware_enable/disable(). The early > + * initialization is needed before calling kvm_init(). > + */ Sorry I take this back. It's upto MIPS reviewer/maintainer to decide whether we need a comment here. > ret = kvm_mips_entry_setup(); > - if (ret) > - return ret; > - > - ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); > - > if (ret) > return ret; > > @@ -1656,6 +1655,13 @@ static int __init kvm_mips_init(void) > > register_die_notifier(&kvm_mips_csr_die_notifier); > > + ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); > + > + if (ret) { > + unregister_die_notifier(&kvm_mips_csr_die_notifier); > + return ret; > + } > + > return 0; > } > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index c5637a4c1c43..4d7e1073984d 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -8433,6 +8433,23 @@ static void vmx_exit(void) > } > module_exit(vmx_exit); > > +/* > + * Early initialization before kvm_init() so that vmx_hardware_enable/disable() > + * can work. > + */ > +static void __init vmx_init_early(void) > +{ > + int cpu; > + > + /* > + * vmx_hardware_disable() accesses loaded_vmcss_on_cpu list. > + * Initialize the variable before kvm_init() that calls > + * vmx_hardware_enable/disable(). > + */ > + for_each_possible_cpu(cpu) > + INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu)); > +} > + > static int __init vmx_init(void) > { > int r, cpu; > @@ -8470,6 +8487,7 @@ static int __init vmx_init(void) > } > #endif > > + vmx_init_early(); > r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx), > __alignof__(struct vcpu_vmx), THIS_MODULE); > if (r) > @@ -8490,11 +8508,8 @@ static int __init vmx_init(void) > > vmx_setup_fb_clear_ctrl(); > > - for_each_possible_cpu(cpu) { > - INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu)); > - > + for_each_possible_cpu(cpu) > pi_init_cpu(cpu); > - } > > #ifdef CONFIG_KEXEC_CORE > rcu_assign_pointer(crash_vmclear_loaded_vmcss, > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 9643b8eadefe..63d4e3ce3e53 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1437,6 +1437,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {} > int kvm_arch_hardware_enable(void); > void kvm_arch_hardware_disable(void); > int kvm_arch_hardware_setup(void *opaque); > +int kvm_arch_post_hardware_enable_setup(void *opaque); > void kvm_arch_hardware_unsetup(void); > int kvm_arch_check_processor_compat(void); > int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 0f5767e5ae45..a43fdbfaede2 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -4995,8 +4995,13 @@ static void hardware_enable_nolock(void *junk) > > cpumask_set_cpu(cpu, cpus_hardware_enabled); > > + r = kvm_arch_check_processor_compat(); > + if (r) > + goto out; > + > r = kvm_arch_hardware_enable(); > > +out: > if (r) { > cpumask_clear_cpu(cpu, cpus_hardware_enabled); > atomic_inc(&hardware_enable_failed); > @@ -5793,9 +5798,9 @@ void kvm_unregister_perf_callbacks(void) > } > #endif > > -static void check_processor_compat(void *rtn) > +__weak int kvm_arch_post_hardware_enable_setup(void *opaque) > { > - *(int *)rtn = kvm_arch_check_processor_compat(); > + return 0; > } > > int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, > @@ -5828,11 +5833,23 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, > if (r < 0) > goto out_free_1; > > - for_each_online_cpu(cpu) { > - smp_call_function_single(cpu, check_processor_compat, &r, 1); > - if (r < 0) > - goto out_free_2; > - } > + /* hardware_enable_nolock() checks CPU compatibility on each CPUs. */ > + r = hardware_enable_all(); > + if (r) > + goto out_free_2; > + /* > + * Arch specific initialization that requires to enable virtualization > + * feature. e.g. TDX module initialization requires VMXON on all > + * present CPUs. > + */ > + kvm_arch_post_hardware_enable_setup(opaque); > + /* > + * Make hardware disabled after the KVM module initialization. KVM > + * enables hardware when the first KVM VM is created and disables > + * hardware when the last KVM VM is destroyed. When no KVM VM is > + * running, hardware is disabled. Keep that semantics. > + */ Except the first sentence, the remaining sentences are more like changelog material. Perhaps just say something below to be more specific on the purpose: /* * Disable hardware on all cpus so that out-of-tree drivers which * also use hardware-assisted virtualization (such as virtualbox * kernel module) can still be loaded when KVM is loaded. */ > + hardware_disable_all(); > > r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting", > kvm_starting_cpu, kvm_dying_cpu); Anyway (although I still think it's better to introduce the __weak kvm_arch_post_hardware_enable_setup() in later patch together with the VMX version to support TDX), with above comments addressed: Acked-by: Kai Huang <kai.huang@xxxxxxxxx> -- Thanks, -Kai