On Wed, May 22, 2024, Chao Gao wrote: > On Tue, May 21, 2024 at 07:28:22PM -0700, Sean Christopherson wrote: > >Register KVM's cpuhp and syscore callback when enabling virtualization > >in hardware instead of registering the callbacks during initialization, > >and let the CPU up/down framework invoke the inner enable/disable > >functions. Registering the callbacks during initialization makes things > >more complex than they need to be, as KVM needs to be very careful about > >handling races between enabling CPUs being onlined/offlined and hardware > >being enabled/disabled. > > > >Intel TDX support will require KVM to enable virtualization during KVM > >initialization, i.e. will add another wrinkle to things, at which point > >sorting out the potential races with kvm_usage_count would become even > >more complex. > > > > >Use a dedicated mutex to guard kvm_usage_count, as taking kvm_lock outside > >cpu_hotplug_lock is disallowed. Ideally, KVM would *always* take kvm_lock > >outside cpu_hotplug_lock, but KVM x86 takes kvm_lock in several notifiers > >that may be called under cpus_read_lock(). kvmclock_cpufreq_notifier() in > >particular has callchains that are infeasible to guarantee will never be > >called with cpu_hotplug_lock held. And practically speaking, using a > >dedicated mutex is a non-issue as the cost is a few bytes for all of KVM. > > Shouldn't this part go to a separate patch? > > I think so because you post a lockdep splat which indicates the existing > locking order is problematic. So, using a dedicated mutex actually fixes > some bug and needs a "Fixes:" tag, so that it can be backported separately. Oooh, good point. I'll try to re-decipher the lockdep splat, and go this route if using a dedicated lock does is indeed fix a real issue. > And Documentation/virt/kvm/locking.rst needs to be updated accordingly. > > Actually, you are doing a partial revert to the commit: > > 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") > > Perhaps you can handle this as a revert. After that, change the lock from > a raw_spinlock_t to a mutex. Hmm, I'd prefer to not revert to a spinlock, even temporarily.