On 12/12/2023 05:32, Ashutosh Sharma wrote:
This doesn't work, try "echo 1 | sudo tee power" instead.
This was not a permission issue, I already gave it read/write permission.
admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
-bash: power: Permission denied
admin@node-4:/sys/bus/pci/slots/14$ sudo chmod 0666 power
admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
echo: write error: Operation not permitted
admin@node-4:/sys/bus/pci/slots/14$
This is from a "Link up" situation (DLActive+), it would be more
interesting to get lspci output of the port in a "No link" situation.
Unfortunately, I did not collect that output before system reboot.
On Tue, 12 Dec 2023 at 16:29, Lukas Wunner <lukas@xxxxxxxxx> wrote:
On Tue, Dec 12, 2023 at 04:04:41PM +0530, Ashutosh Sharma wrote:
Removed one NVMe drive (pci address 0000:83:00.0), it got unbound
successfully from "vfio-pci" driver but saw below error in the syslog.
can't change power state from D0 to D3hot (config space inaccessible)
This is normal, the drive's config space is inaccessible after removal.
Was the removal a "surprise" removal? Or you mean it was by using
'remove' sysfs file?
IIRC surprise removal will need platform firmware support to handle it
properly.
Then after 2:30 min approx, re-inserted the same drive to the same PCI
slot. But the drive was not detected.
Dec 11 23:54:39 node-4 kernel: [183672.630191] pcieport 0000:80:03.2:
pciehp: Slot(14): Attention button pressed
Dec 11 23:54:39 node-4 kernel: [183672.630195] pcieport 0000:80:03.2:
pciehp: Slot(14) Powering on due to button press
Dec 11 23:54:44 node-4 kernel: [183677.671931] pcieport 0000:80:03.2:
pciehp: Slot(14): Card present
Dec 11 23:54:46 node-4 kernel: [183679.783922] pcieport 0000:80:03.2:
pciehp: Slot(14): No link
The link doesn't come up, so the kernel gives up on the slot.
I don't know what the reason is, could be a hardware issue or
protocol incompatibility. This doesn't look like a kernel issue.
| +-03.0 Advanced Micro Devices, Inc. [AMD]
Starship/Matisse PCIe Dummy Host Bridge
| +-03.1-[82]----00.0 Samsung Electronics Co Ltd NVMe SSD
Controller PM9A1/PM9A3/980PRO
| +-03.2-[83]--
Adding Mario, Smita, Yazen from AMD to cc, maybe they have an idea
what the issue is or how to get diagnostics on this Epyc platform.
Start of thread:
https://lore.kernel.org/linux-pci/CADOvten7jG7KjW6W1MRd7i8_E18L0xCCaCzmZOY_vvgJhdfOSw@xxxxxxxxxxxxxx/
admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
echo: write error: Operation not permitted
This doesn't work, try "echo 1 | sudo tee power" instead.
lspci output of the pci port:
80:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD]
Starship/Matisse GPP Bridge (prog-if 00 [Normal decode])
[...]
LnkSta: Speed 16GT/s (ok), Width x4 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
This is from a "Link up" situation (DLActive+), it would be more
interesting to get lspci output of the port in a "No link" situation.
Thanks,
Lukas