On Tue, Nov 28, 2023 at 08:25:16PM +0800, wangdong28 wrote: > From: Dong Wang <wangdong28@xxxxxxxxxx> > > When enabling VMD function in UEFI setup, the physical slot of the M.2 > NVMe device connected to the VMD device cannot be detected. Here is > the result from lspci ("Physical Slot" field is NOT shown): Apparently you're referring to the Physical Slot Number in the Slot Capabilities register of a VMD Root Port? That would not be related to the NVMe device or whatever is *connected* to that Root Port. It's important to use the specific names to understand what's happening here. > 10001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe > Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express]) > Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE M.2 (P4511) > > Generally, the physical slot (/sys/bus/pci/slots) will be created via > either ACPI walking path during kernel init or hotplug path: > > ACPI walking path: > pcibios_add_bus > acpi_pci_add_bus > acpi_pci_slot_enumerate > acpi_walk_namespace > register_slot > pci_create_slot IIUC this path registers the slot with a slot number from the ACPI _SUN method (register_slot() calls check_slot() to evaluate _SUN). > hotplug path: > __pci_hp_initialize > pci_create_slot IIUC this path registers the slot with a slot number from PCI_EXP_SLTCAP_PSN (see init_slot() in pciehp). > [M.2 NVMe Device] > A. VMD disabled > When VMD is disabled, NVMe will be discovered during bus scanning and > recognized as acpi device. In this case, the physical slot is created > via the ACPI walking path. s/acpi device/ACPI device/ > B. VMD enabled > vmd_enable_domain() invokes pcibios_add_bus(). This means that it goes > through the ACPI walking path. However, acpi_pci_add_bus() returns > directly becase the statment "!ACPI_HANDLE(bus->bridge)" is true. > See the following code snippet: s/becase/because/ s/statment/statement/ > void acpi_pci_add_bus(struct pci_bus *bus) > { > ... > if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge)) > return; > ... > } > > Since VMD creates its own root bus and devices of VMD are attached to > the bus, those devices are non-ACPI devices. That's why > "!ACPI_HANDLE(bus->bridge)" returns true. > > In addition, M.2 NVMe devices does not have the hotplug capability. > Here is the quote from PCI Express M.2 Specification (Revision 5.0, > Version 1.0): > > CAUTION: M.2 Add-in Cards are not designed or intended to support > Hot-Swap or Hot-Plug connections. Performing Hot-Swap or Hot-Plug > may pose danger to the M.2 Add-in Card, to the system Platform, > and to the person performing this act. Why do we care about hotplug? I don't think the Physical Slot Number depends on hotplug support. But it does look like we don't look at PCI_EXP_SLTCAP_PSN except in pciehp, so if a slot doesn't support hotplug, I guess we probably don't expose the slot in /sys/bus/pci/slots/. I dunno whether that's the right thing or not. I can see that it might be useful to know what physical slot a device is in even if the slot doesn't support hotplug. And it seems that's what you want to do here? You want to expose the "slot" number of a particular M.2 connector, even though the socket doesn't support hotplug? > M.2 NVMe devices (non-ACPI devices and no hotplug capability) connected > to the VMD device cannot meet the above-mentioned paths. The corresponding > slot info of the M.2 NVMe controller cannot be created in > /sys/bus/pci/slots. > > Fix this issue by checking the available physical slot number in > slot capabilities register. If the physical slot number is available, > create the slot info accordingly. The following lspci output shows the > available slot info with applying this patch: s/physical slot number/Physical Slot Number/ s/slot capabilities/Slot Capabilities/ Capitalize them so we know they refer specifically to things in the PCIe spec. > 10001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe > Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express]) > Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE M.2 (P4511) > Physical Slot: 16 > > [U.2 NVMe device] > A. VMD disabled > Same as M.2 NVMe Device case "A". > > B. VMD enabled > Same as M.2 NVMe Device case "B". > > The hotplug of the U.2 device is optional (See "PCI Express SFF-8639 Module > Specification" for detail). The U.2 NVMe controller with hotplug capability > connected to the VMD device can meet the hotplug path, so the slot info can > be shown correctly via the lspci utility (without this patch): > > 10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe > Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express]) > Subsystem: Lenovo Thinksystem U.2 P4610 NVMe SSD > Physical Slot: 64 > > For U.2 NVMe controller without hotplug capability, this patch is needed > to fix the missing slot info. So it seems like the question is whether we want to expose the slot number in *general* (not just for M.2, U.2, etc) even when the slot does not support hotplug. The same would apply to normal PCIe slots in a server or desktop, where you would put a GPU, NIC, etc, etc. M.2 is a form factor specification that really doesn't have anything to do with the enumeration and configuration done by software. As far as I can tell, there's nothing special about M.2 that justifies this patch, so while M.2 might be an *example* of a case where this is useful, it's not the *reason* for making a change like this. > Suggested-and-reviewed-by: Adrian Huang <ahuang12@xxxxxxxxxx> > Signed-off-by: Dong Wang <wangdong28@xxxxxxxxxx> > --- > v2: > * Fix the build error for non-x86 arch > > --- > arch/x86/pci/common.c | 21 +++++++++++++++++++++ > drivers/pci/pci-acpi.c | 9 ++++++++- > include/linux/pci.h | 1 + > 3 files changed, 30 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c > index ddb7986..b657b07 100644 > --- a/arch/x86/pci/common.c > +++ b/arch/x86/pci/common.c > @@ -731,4 +731,25 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev) > > return dev; > } > + > +#define SLOT_NAME_SIZE 5 > + > +void pci_check_extra_slot_register(struct pci_bus *bus) > +{ > + struct pci_dev *pdev = bus->self; > + char slot_name[SLOT_NAME_SIZE]; > + struct pci_slot *pci_slot; > + u32 slot_cap, slot_nr; > + > + if (!is_vmd(bus) || !pdev || pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap)) > + return; > + > + if (!(slot_cap & PCI_EXP_SLTCAP_HPC)) { > + slot_nr = (slot_cap & PCI_EXP_SLTCAP_PSN) >> 19; > + snprintf(slot_name, SLOT_NAME_SIZE, "%u", slot_nr); > + pci_slot = pci_create_slot(bus, 0, slot_name, NULL); > + if (IS_ERR(pci_slot)) > + pr_err("pci_create_slot returned %ld\n", PTR_ERR(pci_slot)); This patch basically reads the Physical Slot Number in the Slot Capabilities register in a case where we didn't previously do that. I guess we're reading PCI_EXP_SLTCAP from a VMD Root Port? And we currently don't do that for some reason? I assume the same exact problem would occur if that VMD Root Port were connected to an ordinary PCIe slot? > + } > +} > #endif > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index 0045750..e2f2ba8 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -884,6 +884,8 @@ acpi_status pci_acpi_add_pm_notifier(struct acpi_device *dev, > return acpi_add_pm_notifier(dev, &pci_dev->dev, pci_acpi_wake_dev); > } > > +void __weak pci_check_extra_slot_register(struct pci_bus *bus) { } > + > /* > * _SxD returns the D-state with the highest power > * (lowest D-state number) supported in the S-state "x". > @@ -1202,9 +1204,14 @@ void acpi_pci_add_bus(struct pci_bus *bus) > union acpi_object *obj; > struct pci_host_bridge *bridge; > > - if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge)) > + if (acpi_pci_disabled || !bus->bridge) > return; > > + if (!ACPI_HANDLE(bus->bridge)) { > + pci_check_extra_slot_register(bus); > + return; > + } > + > acpi_pci_slot_enumerate(bus); > acpiphp_enumerate_slots(bus); > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 60ca768..b9bb447 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1394,6 +1394,7 @@ static inline int pci_rebar_bytes_to_size(u64 bytes) > bool pci_device_is_present(struct pci_dev *pdev); > void pci_ignore_hotplug(struct pci_dev *dev); > struct pci_dev *pci_real_dma_dev(struct pci_dev *dev); > +void pci_check_extra_slot_register(struct pci_bus *bus); > int pci_status_get_and_clear_errors(struct pci_dev *pdev); > > int __printf(6, 7) pci_request_irq(struct pci_dev *dev, unsigned int nr, > -- > 1.8.3.1 >