Re: [PATCH v2 1/6] KVM: Register cpuhp and syscore callbacks when enabling hardware

Chao Gao <chao.gao@xxxxxxxxx> · Wed, 22 May 2024 14:10:13 +0800

On Tue, May 21, 2024 at 07:28:22PM -0700, Sean Christopherson wrote:
>Register KVM's cpuhp and syscore callback when enabling virtualization
>in hardware instead of registering the callbacks during initialization,
>and let the CPU up/down framework invoke the inner enable/disable
>functions.  Registering the callbacks during initialization makes things
>more complex than they need to be, as KVM needs to be very careful about
>handling races between enabling CPUs being onlined/offlined and hardware
>being enabled/disabled.
>
>Intel TDX support will require KVM to enable virtualization during KVM
>initialization, i.e. will add another wrinkle to things, at which point
>sorting out the potential races with kvm_usage_count would become even
>more complex.
>

>Use a dedicated mutex to guard kvm_usage_count, as taking kvm_lock outside
>cpu_hotplug_lock is disallowed.  Ideally, KVM would *always* take kvm_lock
>outside cpu_hotplug_lock, but KVM x86 takes kvm_lock in several notifiers
>that may be called under cpus_read_lock().  kvmclock_cpufreq_notifier() in
>particular has callchains that are infeasible to guarantee will never be
>called with cpu_hotplug_lock held.  And practically speaking, using a
>dedicated mutex is a non-issue as the cost is a few bytes for all of KVM.

Shouldn't this part go to a separate patch?

I think so because you post a lockdep splat which indicates the existing
locking order is problematic. So, using a dedicated mutex actually fixes
some bug and needs a "Fixes:" tag, so that it can be backported separately.

And Documentation/virt/kvm/locking.rst needs to be updated accordingly.

Actually, you are doing a partial revert to the commit:

  0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")

Perhaps you can handle this as a revert. After that, change the lock from
a raw_spinlock_t to a mutex.