Hi all,
I'm trying to summarise what I'm seeing - please feel free to contact me directly for any further information that I may
have missed. I'm also not subscribed to either kernel.org mailing list, so please CC me in any replies.
History:
At some point in kernel 6.6.x, SCSI hotplug in qemu VMs broke. This was mostly fixed in the following commit to release
6.6.8:
commit 5cc8d88a1b94b900fd74abda744c29ff5845430b
Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Date: Thu Dec 14 09:08:56 2023 -0600
Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
After this commit, the SCSI block device is hotplugged correctly, and a device node as /dev/sdX appears within the qemu VM.
New problem:
When the same SCSI block device is hot-unplugged, the QEMU KVM process will spin at 100% CPU usage. The guest shows no
CPU being used via top, but the host will continue to spin in the KVM thread until the VM is rebooted.
Further information:
Guest: Fedora 39 with kernel 6.6.8 packages from:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2336239
Host: Proxmox 8.1.3 with kernel 6.5.11-7-pve
Messages when a drive is hot-plugged to the guest via:
# qm set 104 -scsi1 /dev/sde
Dec 21 19:44:02 kernel: pci 0000:09:02.0: [1af4:1004] type 00 class 0x010000
Dec 21 19:44:02 kernel: pci 0000:09:02.0: reg 0x10: [io 0x0000-0x003f]
Dec 21 19:44:02 kernel: pci 0000:09:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
Dec 21 19:44:02 kernel: pci 0000:09:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
Dec 21 19:44:02 kernel: pci 0000:09:02.0: BAR 4: assigned [mem 0xc080004000-0xc080007fff 64bit pref]
Dec 21 19:44:02 kernel: pci 0000:09:02.0: BAR 1: assigned [mem 0xc1801000-0xc1801fff]
Dec 21 19:44:02 kernel: pci 0000:09:02.0: BAR 0: assigned [io 0x6040-0x607f]
Dec 21 19:44:02 kernel: virtio-pci 0000:09:02.0: enabling device (0000 -> 0003)
Dec 21 19:44:02 kernel: scsi host7: Virtio SCSI HBA
Dec 21 19:44:02 kernel: scsi 7:0:0:1: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
Dec 21 19:44:02 kernel: sd 7:0:0:1: Power-on or device reset occurred
Dec 21 19:44:02 kernel: sd 7:0:0:1: Attached scsi generic sg1 type 0
Dec 21 19:44:02 kernel: sd 7:0:0:1: LUN assignments on this target have changed. The Linux SCSI layer does not
automatically remap LUN assignments.
Dec 21 19:44:02 kernel: sd 7:0:0:1: [sdb] 3906994318 512-byte logical blocks: (2.00 TB/1.82 TiB)
Dec 21 19:44:02 kernel: sd 7:0:0:1: [sdb] Write Protect is off
Dec 21 19:44:02 kernel: sd 7:0:0:1: [sdb] Mode Sense: 63 00 00 08
Dec 21 19:44:02 kernel: sd 7:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 21 19:44:02 kernel: sd 7:0:0:1: [sdb] Attached SCSI disk
Device node is then available as /dev/sdb as expected.
Hot-unplugging the device in proxmox is done via:
# /usr/sbin/qm set 104 --delete scsi1
where 104 is the VM ID within the proxmox host. I have been trying to trawl through the perl code for the `qm` util to
see how that translates to a qemu command, but haven't nailed anything down yet. The code for the qm util is here:
https://git.proxmox.com/?p=qemu-server.git;a=tree;h=refs/heads/master;hb=refs/heads/master
After the qm command is executed the device node disappears correctly from the running VM, and the VM seems to operate
as normal. The spinning withing the KVM thread seems to only affect the host.
--
Steven Haigh
📧 netwiz@xxxxxxxxx
💻 https://crc.id.au