On Tue, 23 May 2023 12:21:25 -0500 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > Hi Nirmal, thanks for the report! > > On Mon, May 22, 2023 at 04:32:03PM +0000, bugzilla-daemon@xxxxxxxxxx wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=217472 > > ... > > > Created attachment 304301 > > --> https://bugzilla.kernel.org/attachment.cgi?id=304301&action=edit > > Rhel9.1_Guest_dmesg > > > > Issue: > > NVMe Drives are still present after performing hotplug in guest OS. We have > > tested with different combination of OSes, drives and Hypervisor. The issue is > > present across all the OSes. > > Maybe attach the specific commands to reproduce the problem in one of > these scenarios to the bugzilla? I'm a virtualization noob, so I > can't visualize all the usual pieces. > > > The following patch was added to honor ACPI _OSC values set by BIOS and the > > patch helped to bring the issue out in VM/ Guest OS. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b83c7a0e > > > > > > I also compared the values of the parameters in the patch in Host and Guest OS. > > The parameters with different values in Host and Guest OS are: > > > > native_pcie_hotplug > > native_shpc_hotplug > > native_aer > > native_ltr > > > > i.e. > > value of native_pcie_hotplug in Host OS is 1. > > value of native_pcie_hotplug in Guest OS is 0. > > > > I am not sure why "native_pcie_hotplug" is changed to 0 in guest. > > Isn't it OSC_ managed parameter? If that is the case, it should > > have same value in Host and Guest OS. > > From your dmesg: > > DMI: Red Hat KVM/RHEL, BIOS 1.16.0-4.el9 04/01/2014 > _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] > _OSC: platform does not support [PCIeHotplug LTR DPC] > _OSC: OS now controls [SHPCHotplug PME AER PCIeCapability] > acpiphp: Slot [0] registered > virtio_blk virtio3: [vda] 62914560 512-byte logical blocks (32.2 GB/30.0 GiB) > > So the DMI ("KVM/RHEL ...") is the BIOS seen by the guest. Doesn't > mean anything to me, but the KVM folks would know about it. In any > event, the guest BIOS is different from the host BIOS, so I'm not > surprised that _OSC is different. Right, the premise of the issue that guest and host should have the same OSC features is flawed. The guest is a virtual machine that can present an entirely different feature set from the host. A software hotplug on the guest can occur without any bearing to the slot status on the host. > That guest BIOS _OSC declined to grant control of PCIe native hotplug > to the guest OS, so the guest will use acpiphp (not pciehp, which > would be used if native_pcie_hotplug were set). > > The dmesg doesn't mention the nvme driver. Are you using something > like virtio_blk with qemu pointed at an NVMe drive? And you > hot-remove the NVMe device, but the guest OS thinks it's still > present? > > Since the guest is using acpiphp, I would think a hot-remove of a host > NVMe device should be noticed by qemu and turned into an ACPI > notification that the guest OS would consume. But I don't know how > those connections work. If vfio-pci is involved, a cooperative hot-unplug will attempt to unbind the host driver, which triggers a device request through vfio, which is ultimately seen as a hotplug eject operation by the guest. Surprise hotplugs of assigned devices are not supported. There's not enough info in the bz to speculate how this VM is wired or what actions are taken. Thanks, Alex