Re: PCI device hot insert is not detected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 12 Dec 2023 12:29:13 -0600
Mario Limonciello <mario.limonciello@xxxxxxx> wrote:

> On 12/12/2023 05:32, Ashutosh Sharma wrote:
> >> This doesn't work, try "echo 1 | sudo tee power" instead.  
> > 
> > This was not a permission issue, I already gave it read/write permission.
> > 
> > admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
> > -bash: power: Permission denied
> > admin@node-4:/sys/bus/pci/slots/14$ sudo chmod 0666 power
> > admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
> > echo: write error: Operation not permitted
> > admin@node-4:/sys/bus/pci/slots/14$
> >   
> >> This is from a "Link up" situation (DLActive+), it would be more
> >> interesting to get lspci output of the port in a "No link" situation.  
> > 
> > Unfortunately, I did not collect that output before system reboot.
> > 
> > On Tue, 12 Dec 2023 at 16:29, Lukas Wunner <lukas@xxxxxxxxx> wrote:  
> >>
> >> On Tue, Dec 12, 2023 at 04:04:41PM +0530, Ashutosh Sharma wrote:  
> >>> Removed one NVMe drive (pci address 0000:83:00.0), it got unbound
> >>> successfully from "vfio-pci" driver but saw below error in the syslog.
> >>>
> >>> can't change power state from D0 to D3hot (config space inaccessible)  
> >>
> >> This is normal, the drive's config space is inaccessible after removal.
> >>  
> 
> Was the removal a "surprise" removal?  Or you mean it was by using 
> 'remove' sysfs file?
> 
> IIRC surprise removal will need platform firmware support to handle it 
> properly.

The vfio-pci driver also makes zero claims about supporting surprise
removal, you'll likely end up in an inconsistent state.  Thanks,

Alex

> >>> Then after 2:30 min approx, re-inserted the same drive to the same PCI
> >>> slot. But the drive was not detected.
> >>>
> >>> Dec 11 23:54:39 node-4 kernel: [183672.630191] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14): Attention button pressed
> >>> Dec 11 23:54:39 node-4 kernel: [183672.630195] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14) Powering on due to button press
> >>> Dec 11 23:54:44 node-4 kernel: [183677.671931] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14): Card present
> >>> Dec 11 23:54:46 node-4 kernel: [183679.783922] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14): No link  
> >>
> >> The link doesn't come up, so the kernel gives up on the slot.
> >>
> >> I don't know what the reason is, could be a hardware issue or
> >> protocol incompatibility.  This doesn't look like a kernel issue.
> >>
> >>  
> >>>   |           +-03.0  Advanced Micro Devices, Inc. [AMD]
> >>> Starship/Matisse PCIe Dummy Host Bridge
> >>>   |           +-03.1-[82]----00.0  Samsung Electronics Co Ltd NVMe SSD
> >>> Controller PM9A1/PM9A3/980PRO
> >>>   |           +-03.2-[83]--  
> >>
> >> Adding Mario, Smita, Yazen from AMD to cc, maybe they have an idea
> >> what the issue is or how to get diagnostics on this Epyc platform.
> >>
> >> Start of thread:
> >> https://lore.kernel.org/linux-pci/CADOvten7jG7KjW6W1MRd7i8_E18L0xCCaCzmZOY_vvgJhdfOSw@xxxxxxxxxxxxxx/
> >>
> >>  
> >>> admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
> >>> echo: write error: Operation not permitted  
> >>
> >> This doesn't work, try "echo 1 | sudo tee power" instead.
> >>
> >>  
> >>> lspci output of the pci port:
> >>> 80:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD]
> >>> Starship/Matisse GPP Bridge (prog-if 00 [Normal decode])  
> >> [...]  
> >>>                  LnkSta: Speed 16GT/s (ok), Width x4 (ok)
> >>>                          TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-  
> >>
> >> This is from a "Link up" situation (DLActive+), it would be more
> >> interesting to get lspci output of the port in a "No link" situation.
> >>
> >> Thanks,
> >>
> >> Lukas  
> 





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux