RE: Issue about PCI physical slot fetch incorrect number

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn,
Sorry for the late response, we do need some time to collect the information with other team, that's why I got back to you after almost a week.

[Bjorn]I don't know off the top of my head why lspci doesn't report a "Physical Slot:" for 21:00.0.  I suppose the kernel didn't provide something in /sys for it. All the other "Physical Slot:" reports from lspci match the "Physical Slot Number" from the PCIe Capability of the bridge leading to the slot, *except* for 33:00.0.  In that case, the "Physical Slot Number" from the bridge PCIe Capability is not unique.  Both 20:01.1 and 32:00.0 advertise Slot #0 there, so the kernel make the sysfs slot unique, e.g., "0-6".

->I think we are on the same page about this. I am wondering why 33:00.0, which is GPU device that being an exception for not reaching the correct physical slot number. As you said, GPU device which is 33:00.0 display physical slot with number 0 due to its bridge didn't have the proper slot number.

[Bjorn]From lspci_vvv_team2.txt: This seems strange to me. For 39:01.0, lspci reports "Physical Slot: 24", but 39:01.0 is a Downstream Port that *leads* to a slot; it's not a slot itself.  3b:00.0 is the device in that slot, and I think it should have a slot number, but it doesn't. Similarly, lspci reports "Physical Slot: 39" for 39:02.0, when it should show 3e:00.0 being in slot 39. I guess this team2 situation is what you're trying to understand?

->I got your opinion. Yes, this situation is also what I want to clarify.

To be brief, there are two question I would like to know:
1. From team1, all device except GPU device fetch the correct physical slot number. Due to GPU device's bridge didn't have the slot number thus when it downstream to end device, the slot number will become "0", which I believe is not the correct slot number it should display.
2.From team2, the slot number shows on downstream port instead of end device itself. So I think this is the reason why their GPU device doesn't have the slot number? Because their slot number shows on the downstream port?

[Bjorn]Can you collect the complete dmesg log and output of "grep -r ./sys/bus/pci/slots" for both team1 and team2?  We should be able to puzzle out what's going on.  The dmesg logging will show which hotplug drivers are in use and should have hints about slot numbering, and if it doesn't, we may need to add some.

->Also based on what you request, we do collect the dmesg from team1 and team2. Please help up to look inside and provide us the action we can make in the next step to fix this issue.
If you need any further information, please feel free to tell me. I will do my best to get back to you as soon as possible.

I will send both dmesg and slot file under path /sys/bus/pci from team1 and team2 to you. Please find attachment for more details, really appreciate your help and assistance.
Hope to hear from you soon.

BR,
Erin
-----Original Message-----
From: Bjorn Helgaas <helgaas@xxxxxxxxxx> 
Sent: Friday, August 30, 2024 12:35 AM
To: Erin Tsao/WHQ/Wistron <Erin_Tsao@xxxxxxxxxxx>
Cc: linux-pci@xxxxxxxxxxxxxxx; mj@xxxxxx
Subject: Re: Issue about PCI physical slot fetch incorrect number

On Mon, Aug 26, 2024 at 08:27:09AM +0000, Erin_Tsao@xxxxxxxxxxx wrote:
> Hi Bjorn,
> Sorry for the late response. And thanks for responding to my question.
> There's a few thing I would like to clarify with you.
> 1. Is the physical slot number associate with the configuration of 
> device itself or with the configuration of device's parent?

A PCIe device doesn't know its own slot number.  The bridge leading to a slot (either a Root Port or a Switch Downstream Port) has the Slot Capability/Status/Control registers that manage the slot.  The Slot Capabilities register contains a "Physical Slot Number".  This is HwInit, which means it's set by hardware or firmware, and it's supposed to be a number that's unique within the chassis.

The "Physical Slot" reported by lspci for Endpoints comes from sysfs, not from the device itself.  See https://urldefense.com/v3/__https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/lib/sysfs.c?id=v3.13.0*n277__;Iw!!AAkNMFMq5MQ!cRxVCvUr6rEgDN9a_S_dxnHv2u1BP6J_Ue82PAqDcmqxFg_DhbQBLrfGaNdA6vCBLtLcdo-c-hPaS2SV2A$ 


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux