> > Was the removal a "surprise" removal? Or you mean it was by using > > 'remove' sysfs file? > > > > IIRC surprise removal will need platform firmware support to handle it > > properly. Yes, it was a surprise removal. On Wed, 13 Dec 2023 at 00:37, Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > On Tue, 12 Dec 2023 12:29:13 -0600 > Mario Limonciello <mario.limonciello@xxxxxxx> wrote: > > > On 12/12/2023 05:32, Ashutosh Sharma wrote: > > >> This doesn't work, try "echo 1 | sudo tee power" instead. > > > > > > This was not a permission issue, I already gave it read/write permission. > > > > > > admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power > > > -bash: power: Permission denied > > > admin@node-4:/sys/bus/pci/slots/14$ sudo chmod 0666 power > > > admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power > > > echo: write error: Operation not permitted > > > admin@node-4:/sys/bus/pci/slots/14$ > > > > > >> This is from a "Link up" situation (DLActive+), it would be more > > >> interesting to get lspci output of the port in a "No link" situation. > > > > > > Unfortunately, I did not collect that output before system reboot. > > > > > > On Tue, 12 Dec 2023 at 16:29, Lukas Wunner <lukas@xxxxxxxxx> wrote: > > >> > > >> On Tue, Dec 12, 2023 at 04:04:41PM +0530, Ashutosh Sharma wrote: > > >>> Removed one NVMe drive (pci address 0000:83:00.0), it got unbound > > >>> successfully from "vfio-pci" driver but saw below error in the syslog. > > >>> > > >>> can't change power state from D0 to D3hot (config space inaccessible) > > >> > > >> This is normal, the drive's config space is inaccessible after removal. > > >> > > > > Was the removal a "surprise" removal? Or you mean it was by using > > 'remove' sysfs file? > > > > IIRC surprise removal will need platform firmware support to handle it > > properly. > > The vfio-pci driver also makes zero claims about supporting surprise > removal, you'll likely end up in an inconsistent state. Thanks, > > Alex > > > >>> Then after 2:30 min approx, re-inserted the same drive to the same PCI > > >>> slot. But the drive was not detected. > > >>> > > >>> Dec 11 23:54:39 node-4 kernel: [183672.630191] pcieport 0000:80:03.2: > > >>> pciehp: Slot(14): Attention button pressed > > >>> Dec 11 23:54:39 node-4 kernel: [183672.630195] pcieport 0000:80:03.2: > > >>> pciehp: Slot(14) Powering on due to button press > > >>> Dec 11 23:54:44 node-4 kernel: [183677.671931] pcieport 0000:80:03.2: > > >>> pciehp: Slot(14): Card present > > >>> Dec 11 23:54:46 node-4 kernel: [183679.783922] pcieport 0000:80:03.2: > > >>> pciehp: Slot(14): No link > > >> > > >> The link doesn't come up, so the kernel gives up on the slot. > > >> > > >> I don't know what the reason is, could be a hardware issue or > > >> protocol incompatibility. This doesn't look like a kernel issue. > > >> > > >> > > >>> | +-03.0 Advanced Micro Devices, Inc. [AMD] > > >>> Starship/Matisse PCIe Dummy Host Bridge > > >>> | +-03.1-[82]----00.0 Samsung Electronics Co Ltd NVMe SSD > > >>> Controller PM9A1/PM9A3/980PRO > > >>> | +-03.2-[83]-- > > >> > > >> Adding Mario, Smita, Yazen from AMD to cc, maybe they have an idea > > >> what the issue is or how to get diagnostics on this Epyc platform. > > >> > > >> Start of thread: > > >> https://lore.kernel.org/linux-pci/CADOvten7jG7KjW6W1MRd7i8_E18L0xCCaCzmZOY_vvgJhdfOSw@xxxxxxxxxxxxxx/ > > >> > > >> > > >>> admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power > > >>> echo: write error: Operation not permitted > > >> > > >> This doesn't work, try "echo 1 | sudo tee power" instead. > > >> > > >> > > >>> lspci output of the pci port: > > >>> 80:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] > > >>> Starship/Matisse GPP Bridge (prog-if 00 [Normal decode]) > > >> [...] > > >>> LnkSta: Speed 16GT/s (ok), Width x4 (ok) > > >>> TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- > > >> > > >> This is from a "Link up" situation (DLActive+), it would be more > > >> interesting to get lspci output of the port in a "No link" situation. > > >> > > >> Thanks, > > >> > > >> Lukas > > >