On Thu, Apr 18, 2024, Kai Huang wrote: > On 18/04/2024 2:40 am, Sean Christopherson wrote: > > This way, architectures that aren't saddled with out-of-tree hypervisors can do > > the dead simple thing of enabling hardware during their initialization sequence, > > and the TDX code is much more sane, e.g. invoke kvm_x86_enable_virtualization() > > during late_hardware_setup(), and kvm_x86_disable_virtualization() during module > > exit (presumably). > > Fine to me, given I am not familiar with other ARCHs, assuming always enable > virtualization when KVM present is fine to them. :-) > > Two questions below: > > > +int kvm_x86_enable_virtualization(void) > > +{ > > + int r; > > + > > + guard(mutex)(&vendor_module_lock); > > It's a little bit odd to take the vendor_module_lock mutex. > > It is called by kvm_arch_init_vm(), so more reasonablly we should still use > kvm_lock? I think this should take an x86-specific lock, since it's guarding x86-specific data. And vendor_module_lock fits the bill perfectly. Well, except for the name, and I definitely have no objection to renaming it. > Also, if we invoke kvm_x86_enable_virtualization() from > kvm_x86_ops->late_hardware_setup(), then IIUC we will deadlock here because > kvm_x86_vendor_init() already takes the vendor_module_lock? Ah, yeah. Oh, duh. I think the reason I didn't initially suggest late_hardware_setup() is that I was assuming/hoping TDX setup could be done after kvm_x86_vendor_exit(). E.g. in vt_init() or whatever it gets called: r = kvm_x86_vendor_exit(...); if (r) return r; if (enable_tdx) { r = tdx_blah_blah_blah(); if (r) goto vendor_exit; } > > + if (kvm_usage_count++) > > + return 0; > > + > > + r = kvm_enable_virtualization(); > > + if (r) > > + --kvm_usage_count; > > + > > + return r; > > +} > > +EXPORT_SYMBOL_GPL(kvm_x86_enable_virtualization); > > + > > [...] > > > +int kvm_enable_virtualization(void) > > { > > + int r; > > + > > + r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online", > > + kvm_online_cpu, kvm_offline_cpu); > > + if (r) > > + return r; > > + > > + register_syscore_ops(&kvm_syscore_ops); > > + > > + /* > > + * Manually undo virtualization enabling if the system is going down. > > + * If userspace initiated a forced reboot, e.g. reboot -f, then it's > > + * possible for an in-flight module load to enable virtualization > > + * after syscore_shutdown() is called, i.e. without kvm_shutdown() > > + * being invoked. Note, this relies on system_state being set _before_ > > + * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked > > + * or this CPU observes the impedning shutdown. Which is why KVM uses > > + * a syscore ops hook instead of registering a dedicated reboot > > + * notifier (the latter runs before system_state is updated). > > + */ > > + if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF || > > + system_state == SYSTEM_RESTART) { > > + unregister_syscore_ops(&kvm_syscore_ops); > > + cpuhp_remove_state(CPUHP_AP_KVM_ONLINE); > > + return -EBUSY; > > + } > > + > > Aren't we also supposed to do: > > on_each_cpu(__kvm_enable_virtualization, NULL, 1); > > here? No, cpuhp_setup_state() invokes the callback, kvm_online_cpu(), on each CPU. I.e. KVM has been doing things the hard way by using cpuhp_setup_state_nocalls(). That's part of the complexity I would like to get rid of.