On Tue, May 21, 2024 at 07:28:22PM -0700, Sean Christopherson wrote: >Register KVM's cpuhp and syscore callback when enabling virtualization >in hardware instead of registering the callbacks during initialization, >and let the CPU up/down framework invoke the inner enable/disable >functions. Registering the callbacks during initialization makes things >more complex than they need to be, as KVM needs to be very careful about >handling races between enabling CPUs being onlined/offlined and hardware >being enabled/disabled. > >Intel TDX support will require KVM to enable virtualization during KVM >initialization, i.e. will add another wrinkle to things, at which point >sorting out the potential races with kvm_usage_count would become even >more complex. > >Use a dedicated mutex to guard kvm_usage_count, as taking kvm_lock outside >cpu_hotplug_lock is disallowed. Ideally, KVM would *always* take kvm_lock >outside cpu_hotplug_lock, but KVM x86 takes kvm_lock in several notifiers >that may be called under cpus_read_lock(). kvmclock_cpufreq_notifier() in >particular has callchains that are infeasible to guarantee will never be >called with cpu_hotplug_lock held. And practically speaking, using a >dedicated mutex is a non-issue as the cost is a few bytes for all of KVM. Shouldn't this part go to a separate patch? I think so because you post a lockdep splat which indicates the existing locking order is problematic. So, using a dedicated mutex actually fixes some bug and needs a "Fixes:" tag, so that it can be backported separately. And Documentation/virt/kvm/locking.rst needs to be updated accordingly. Actually, you are doing a partial revert to the commit: 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") Perhaps you can handle this as a revert. After that, change the lock from a raw_spinlock_t to a mutex.