Re: [PATCH v2 1/6] KVM: Register cpuhp and syscore callbacks when enabling hardware

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 29 May 2024 07:29:18 -0700

On Wed, May 22, 2024, Chao Gao wrote:
> On Tue, May 21, 2024 at 07:28:22PM -0700, Sean Christopherson wrote:
> >Register KVM's cpuhp and syscore callback when enabling virtualization
> >in hardware instead of registering the callbacks during initialization,
> >and let the CPU up/down framework invoke the inner enable/disable
> >functions.  Registering the callbacks during initialization makes things
> >more complex than they need to be, as KVM needs to be very careful about
> >handling races between enabling CPUs being onlined/offlined and hardware
> >being enabled/disabled.
> >
> >Intel TDX support will require KVM to enable virtualization during KVM
> >initialization, i.e. will add another wrinkle to things, at which point
> >sorting out the potential races with kvm_usage_count would become even
> >more complex.
> >
> 
> >Use a dedicated mutex to guard kvm_usage_count, as taking kvm_lock outside
> >cpu_hotplug_lock is disallowed.  Ideally, KVM would *always* take kvm_lock
> >outside cpu_hotplug_lock, but KVM x86 takes kvm_lock in several notifiers
> >that may be called under cpus_read_lock().  kvmclock_cpufreq_notifier() in
> >particular has callchains that are infeasible to guarantee will never be
> >called with cpu_hotplug_lock held.  And practically speaking, using a
> >dedicated mutex is a non-issue as the cost is a few bytes for all of KVM.
> 
> Shouldn't this part go to a separate patch?
> 
> I think so because you post a lockdep splat which indicates the existing
> locking order is problematic. So, using a dedicated mutex actually fixes
> some bug and needs a "Fixes:" tag, so that it can be backported separately.

Oooh, good point.  I'll try to re-decipher the lockdep splat, and go this route
if using a dedicated lock does is indeed fix a real issue.

> And Documentation/virt/kvm/locking.rst needs to be updated accordingly.
> 
> Actually, you are doing a partial revert to the commit:
> 
>   0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
> 
> Perhaps you can handle this as a revert. After that, change the lock from
> a raw_spinlock_t to a mutex.

Hmm, I'd prefer to not revert to a spinlock, even temporarily.