On Thu, Feb 24, 2022, Sean Christopherson wrote: > Reacquire kvm->srcu in vcpu_run() before returning to the caller if srcu > was dropped to handle pending work. If the task receives a signal, KVM > will exit without reacquiring kvm->srcu, resulting in an unbalanced > unlock kvm_arch_vcpu_ioctl_run(), and eventually hung tasks. > > ===================================== > WARNING: bad unlock balance detected! > 5.17.0-rc3+ #749 Not tainted > ------------------------------------- > CPU 0/KVM/1803 is trying to release lock (&kvm->srcu) at: > [<ffffffff81042a19>] kvm_arch_vcpu_ioctl_run+0x669/0x1f60 > but there are no more locks to release! > > other info that might help us debug this: > 1 lock held by CPU 0/KVM/1803: > #0: ffff88810489c0b0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x77/0x690 > > stack backtrace: > CPU: 7 PID: 1803 Comm: CPU 0/KVM Not tainted 5.17.0-rc3+ #749 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x44 > lock_release+0x1b4/0x240 > kvm_arch_vcpu_ioctl_run+0x680/0x1f60 > kvm_vcpu_ioctl+0x279/0x690 > __x64_sys_ioctl+0x83/0xb0 > do_syscall_64+0x3b/0xc0 > entry_SYSCALL_64_after_hwframe+0x44/0xae > </TASK> > INFO: task stable:2347 blocked for more than 120 seconds. > Not tainted 5.17.0-rc3+ #749 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:stable state:D stack: 0 pid: 2347 ppid: 2340 flags:0x00000000 > Call Trace: > <TASK> > __schedule+0x328/0xa00 > schedule+0x44/0xb0 > schedule_timeout+0x26f/0x300 > wait_for_completion+0x84/0xe0 > __synchronize_srcu.part.0+0x7a/0xa0 > kvm_swap_active_memslots+0x141/0x180 > kvm_set_memslot+0x2f9/0x470 > kvm_set_memory_region+0x29/0x40 > kvm_vm_ioctl+0x2c3/0xd70 > __x64_sys_ioctl+0x83/0xb0 > do_syscall_64+0x3b/0xc0 > entry_SYSCALL_64_after_hwframe+0x44/0xae > </TASK> > INFO: lockdep is turned off. Ugh, the task hung is actually a different mess introduced by the same patch. I suspect I'm hitting the one Like reported. I'll get a fix posted shortly...