Re: Qemu KVM thread spins at 100% CPU usage on scsi hot-unplug (kernel 6.6.8 guest)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Am 29.12.23 um 06:46 schrieb Steven Haigh:
> On 29/12/23 00:18, Lukas Wunner wrote:
>> On Thu, Dec 28, 2023 at 01:03:10PM +1100, Steven Haigh wrote:
>>> At some point in kernel 6.6.x, SCSI hotplug in qemu VMs broke. This was
>>> mostly fixed in the following commit to release 6.6.8:
>>>     commit 5cc8d88a1b94b900fd74abda744c29ff5845430b
>>>     Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
>>>     Date:   Thu Dec 14 09:08:56 2023 -0600
>>>     Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
>>>
>>> After this commit, the SCSI block device is hotplugged correctly, and
>>> a device node as /dev/sdX appears within the qemu VM.
>>>
>>> New problem:
>>>
>>> When the same SCSI block device is hot-unplugged, the QEMU KVM
>>> process will
>>> spin at 100% CPU usage. The guest shows no CPU being used via top,
>>> but the
>>> host will continue to spin in the KVM thread until the VM is rebooted.
>>
>> Find out the PID of the qemu process on the host, then cat
>> /proc/$PID/stack
>> to see where the CPU time is spent.
> 
> Thanks for the tip - I'll certainly do that.
> 
> Annoyingly, since I posted this report originally, then adding in a new
> report to the kernel.org lists in this, I have been unable to reproduce
> this problem. I have successfully done ~22 scsi hotplug / remove cycles
> and none resulted in reproducing the issue.
> 
> Kernel versions are still the same on both proxmox host and the Fedora
> guest - however I see an update on the host of the qemu-kvm packages in
> Proxmox. The proxmox host hasn't even been rebooted in this time.
> 
> I wonder if the initial revert included in 6.6.8 fixed the main problem,
> and the later update to qemu-kvm packages on the proxmox host followed
> by the last reboot of the VM with the new KVM package sorted the second
> issue.
> 
> Seeing as I can no longer reproduce this reliably - whereas it was 100%
> reproducible prior, maybe I'm now chasing ghosts.
> 

That sounds likely. Version pve-qemu-kvm=8.1.2-5 had a regression where
an IO thread in QEMU could start spinning after a drain (which happens
during hotplug on the QEMU side). It was introduced by an attempted fix
for a much rarer problem [0] and was reverted in pve-qemu-kvm=8.1.2-6
[1]. A proper fix is still being worked on [2].

[0]:
https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=6b7c1815e1c89cb66ff48fbba6da69fe6d254630
[1]:
https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=2a49e667bae33f2a5c6ba6b59a0cd26387f73a27
[2]: https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01900.html

Best Regards,
Fiona

> I'll still continue to monitor - as I normally do this SCSI hotplug ~3
> times per week doing backups to different external HDDs - so if I do
> observe it again, I'll grab the stack and reply to this thread again
> with what I can find.
> 
> Until then, I don't want to waste other peoples time also chasing ghosts :)
> 






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux