Re: [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 25, 2024, Sean Christopherson wrote:
> Register KVM's cpuhp and syscore callback when enabling virtualization
> in hardware instead of registering the callbacks during initialization,
> and let the CPU up/down framework invoke the inner enable/disable
> functions.  Registering the callbacks during initialization makes things
> more complex than they need to be, as KVM needs to be very careful about
> handling races between enabling CPUs being onlined/offlined and hardware
> being enabled/disabled.
> 
> Intel TDX support will require KVM to enable virtualization during KVM
> initialization, i.e. will add another wrinkle to things, at which point
> sorting out the potential races with kvm_usage_count would become even
> more complex.
> +static int hardware_enable_all(void)
> +{
> +	int r;
> +
> +	guard(mutex)(&kvm_lock);
> +
> +	if (kvm_usage_count++)
> +		return 0;
> +
> +	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> +			      kvm_online_cpu, kvm_offline_cpu);
> +	if (r)
> +		return r;

There's a lock ordering issue here.  KVM currently takes kvm_lock inside
cpu_hotplug_lock, but this code does the opposite.  I need to take a closer look
at the locking, as I'm not entirely certain that the existing ordering is correct
or ideal.  E.g. cpu_hotplug_lock is taken when updating static keys, static calls,
etc., which makes taking cpu_hotplug_lock outside kvm_lock dicey, as flows that
take kvm_lock then need to be very careful to never trigger seemingly innocuous
updates.

And this lockdep splat that I've now hit twice with the current implementation
suggests that cpu_hotplug_lock => kvm_lock is already unsafe/broken (I need to
re-decipher the splat; I _think_ mostly figured it out last week, but then forgot
over the weekend).


[48419.244819] ======================================================
[48419.250999] WARNING: possible circular locking dependency detected
[48419.257179] 6.9.0-smp--04b1c6b4841d-next #301 Tainted: G S         O      
[48419.264050] ------------------------------------------------------
[48419.270229] tee/27085 is trying to acquire lock:
[48419.274849] ffffb5fdd4e430a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.284085] 
               but task is already holding lock:
[48419.289918] ffffffffc06ccba8 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]
[48419.298386] 
               which lock already depends on the new lock.

[48419.306559] 
               the existing dependency chain (in reverse order) is:
[48419.314040] 
               -> #3 (kvm_lock){+.+.}-{3:3}:
[48419.319523]        __mutex_lock+0x6a/0xb40
[48419.323622]        mutex_lock_nested+0x1f/0x30
[48419.328071]        kvm_dev_ioctl+0x4fb/0xe50 [kvm]
[48419.332898]        __se_sys_ioctl+0x7b/0xd0
[48419.337082]        __x64_sys_ioctl+0x21/0x30
[48419.341357]        do_syscall_64+0x8b/0x170
[48419.345540]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.351115] 
               -> #2 (cpu_hotplug_lock){++++}-{0:0}:
[48419.357294]        cpus_read_lock+0x2e/0xb0
[48419.361480]        static_key_slow_inc+0x16/0x30
[48419.366098]        kvm_lapic_set_base+0x6a/0x1c0 [kvm]
[48419.371298]        kvm_set_apic_base+0x8f/0xe0 [kvm]
[48419.376298]        kvm_set_msr_common+0xa29/0x1080 [kvm]
[48419.381645]        vmx_set_msr+0xa54/0xbe0 [kvm_intel]
[48419.386795]        __kvm_set_msr+0xb6/0x1a0 [kvm]
[48419.391535]        kvm_arch_vcpu_ioctl+0xf66/0x1150 [kvm]
[48419.396970]        kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
[48419.401881]        __se_sys_ioctl+0x7b/0xd0
[48419.406067]        __x64_sys_ioctl+0x21/0x30
[48419.410342]        do_syscall_64+0x8b/0x170
[48419.414528]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.420099] 
               -> #1 (&kvm->srcu){.?.+}-{0:0}:
[48419.425758]        __synchronize_srcu+0x44/0x1a0
[48419.430378]        synchronize_srcu_expedited+0x21/0x30
[48419.435603]        kvm_swap_active_memslots+0x113/0x1c0 [kvm]
[48419.441385]        kvm_set_memslot+0x360/0x620 [kvm]
[48419.446387]        __kvm_set_memory_region+0x27b/0x300 [kvm]
[48419.452078]        kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm]
[48419.458207]        kvm_vm_ioctl+0x295/0x650 [kvm]
[48419.462945]        __se_sys_ioctl+0x7b/0xd0
[48419.467133]        __x64_sys_ioctl+0x21/0x30
[48419.471406]        do_syscall_64+0x8b/0x170
[48419.475590]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.481164] 
               -> #0 (&kvm->slots_lock){+.+.}-{3:3}:
[48419.487343]        __lock_acquire+0x15ef/0x2e40
[48419.491876]        lock_acquire+0xe0/0x260
[48419.495977]        __mutex_lock+0x6a/0xb40
[48419.500076]        mutex_lock_nested+0x1f/0x30
[48419.504521]        set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.509694]        param_attr_store+0x93/0x100
[48419.514142]        module_attr_store+0x22/0x40
[48419.518587]        sysfs_kf_write+0x81/0xb0
[48419.522774]        kernfs_fop_write_iter+0x133/0x1d0
[48419.527738]        vfs_write+0x317/0x380
[48419.531663]        ksys_write+0x70/0xe0
[48419.535505]        __x64_sys_write+0x1f/0x30
[48419.539777]        do_syscall_64+0x8b/0x170
[48419.543961]        entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.549534] 
               other info that might help us debug this:

[48419.557534] Chain exists of:
                 &kvm->slots_lock --> cpu_hotplug_lock --> kvm_lock

[48419.567873]  Possible unsafe locking scenario:

[48419.573793]        CPU0                    CPU1
[48419.578325]        ----                    ----
[48419.582859]   lock(kvm_lock);
[48419.585831]                                lock(cpu_hotplug_lock);
[48419.592011]                                lock(kvm_lock);
[48419.597499]   lock(&kvm->slots_lock);
[48419.601173] 
                *** DEADLOCK ***

[48419.607090] 5 locks held by tee/27085:
[48419.610841]  #0: ffffa0dcc92eb410 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0xe4/0x380
[48419.618765]  #1: ffffa0dce221ac88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xd8/0x1d0
[48419.627553]  #2: ffffa11c4d6bef28 (kn->active#257){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xe0/0x1d0
[48419.636684]  #3: ffffffffc06e3c50 (&mod->param_lock){+.+.}-{3:3}, at: param_attr_store+0x57/0x100
[48419.645575]  #4: ffffffffc06ccba8 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]
[48419.654564] 
               stack backtrace:
[48419.658925] CPU: 93 PID: 27085 Comm: tee Tainted: G S         O       6.9.0-smp--04b1c6b4841d-next #301
[48419.668312] Hardware name: Google Interlaken/interlaken, BIOS 0.20231025.0-0 10/25/2023
[48419.676314] Call Trace:
[48419.678770]  <TASK>
[48419.680878]  dump_stack_lvl+0x83/0xc0
[48419.684552]  dump_stack+0x14/0x20
[48419.687880]  print_circular_bug+0x2f0/0x300
[48419.692068]  check_noncircular+0x103/0x120
[48419.696163]  ? __lock_acquire+0x5e3/0x2e40
[48419.700266]  __lock_acquire+0x15ef/0x2e40
[48419.704286]  ? __lock_acquire+0x5e3/0x2e40
[48419.708387]  ? __lock_acquire+0x5e3/0x2e40
[48419.712486]  ? __lock_acquire+0x5e3/0x2e40
[48419.716586]  lock_acquire+0xe0/0x260
[48419.720164]  ? set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.725060]  ? lock_acquire+0xe0/0x260
[48419.728822]  ? param_attr_store+0x57/0x100
[48419.732921]  ? set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.737768]  __mutex_lock+0x6a/0xb40
[48419.741359]  ? set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.746207]  ? set_nx_huge_pages+0x14a/0x1e0 [kvm]
[48419.751054]  ? param_attr_store+0x57/0x100
[48419.755158]  ? __mutex_lock+0x240/0xb40
[48419.759005]  ? param_attr_store+0x57/0x100
[48419.763107]  mutex_lock_nested+0x1f/0x30
[48419.767031]  set_nx_huge_pages+0x179/0x1e0 [kvm]
[48419.771705]  param_attr_store+0x93/0x100
[48419.775629]  module_attr_store+0x22/0x40
[48419.779556]  sysfs_kf_write+0x81/0xb0
[48419.783222]  kernfs_fop_write_iter+0x133/0x1d0
[48419.787668]  vfs_write+0x317/0x380
[48419.791084]  ksys_write+0x70/0xe0
[48419.794401]  __x64_sys_write+0x1f/0x30
[48419.798152]  do_syscall_64+0x8b/0x170
[48419.801819]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[48419.806872] RIP: 0033:0x7f98eff1ee0d
[48419.810465] Code: e5 41 57 41 56 53 50 49 89 d6 49 89 f7 89 fb 48 8b 05 37 7c 07 00 83 38 00 75 28 b8 01 00 00 00 89 df 4c 89 fe 4c 89 f2 0f 05 <48> 89 c3 48 3d 01 f0 ff ff 73 3a 48 89 d8 48 83 c4 08 5b 41 5e 41
[48419.829213] RSP: 002b:00007ffd5dee15c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[48419.836792] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f98eff1ee0d
[48419.843931] RDX: 0000000000000002 RSI: 00007ffd5dee16d0 RDI: 0000000000000005
[48419.851065] RBP: 00007ffd5dee15e0 R08: 00007f98efe161d2 R09: 0000000000000001
[48419.858196] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000002
[48419.865328] R13: 00007ffd5dee16d0 R14: 0000000000000002 R15: 00007ffd5dee16d0
[48419.872472]  </TASK>






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux