Re: [PATCH 1/2 v2] PCI/portdrv: Enable reporting inter-switch P2P links

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 19 Sep 2024 01:13:43 -0700
Shivasharan S <shivasharan.srikanteshwara@xxxxxxxxxxxx> wrote:

> Broadcom PCI switches supports inter-switch P2P links between two
> PCI-to-PCI bridges. This presents an optimal data path for data
> movement. The patch exports a new sysfs entry for PCI devices that
> support the inter switch P2P links and reports the B:D:F information
> of the devices that are connected through this inter switch link as
> shown below:
> 
>                              Host root bridge
>                 ---------------------------------------
>                 |                                     |
>   NIC1 --- PCI Switch1 --- Inter-switch link --- PCI Switch2 --- NIC2
> (2c:00.0)   (2a:00.0)                             (3d:00.0)   (40:00.0)
>                 |                                     |
>                GPU1                                  GPU2
>             (2d:00.0)                             (3f:00.0)
>                                SERVER 1
> 
> $ find /sys/ -name "p2p_link" | xargs grep .
> /sys/devices/pci0000:29/0000:29:01.0/0000:2a:00.0/p2p_link:0000:3d:00.0
> /sys/devices/pci0000:3c/0000:3c:01.0/0000:3d:00.0/p2p_link:0000:2a:00.0
> 
> Current support is added to report the P2P links available for
> Broadcom switches based on the capability that is reported by the
> upstream bridges through their vendor-specific capability registers.
> 
> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@xxxxxxxxxxxx>
> Signed-off-by: Sumanesh Samanta <sumanesh.samanta@xxxxxxxxxxxx>
> ---
> Changes in v2:
> Integrated the code into PCIe portdrv to create the sysfs entries during
> probe, as suggested by Mani.

Hmm. So we are trying to rework portdrv in general to support extensibility.
I'm a little nervous about taking in vendor specific code in the meantime
even if it is trivial like this is.  We will be having an extensible
discovery scheme but for now the plan is that will be child device based
for non PCI standard features.

It is a fairly small bit of code, so maybe it is fine - I'm not keen
on adding the implementation of the vendor specific parts to the
main driver though. Push that to an optional c file.

A few general comments inline.

> 
>  Documentation/ABI/testing/sysfs-bus-pci |  14 +++
>  drivers/pci/pcie/portdrv.c              | 131 ++++++++++++++++++++++++
>  drivers/pci/pcie/portdrv.h              |  10 ++
>  3 files changed, 155 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
> index ecf47559f495..e5d02f20655f 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -572,3 +572,17 @@ Description:
>  		enclosure-specific indications "specific0" to "specific7",
>  		hence the corresponding led class devices are unavailable if
>  		the DSM interface is used.
> +
> +What:		/sys/bus/pci/devices/.../p2p_link
> +Date:		September 2024
> +Contact:	Shivasharan S <shivasharan.srikanteshwara@xxxxxxxxxxxx>
> +Description:
> +		This file appears on PCIe upstream ports which supports an
> +		internal P2P link.
> +		Reading this attribute will provide the list of other upstream
> +		ports on the system which have an internal P2P link available
> +		between the two ports.

Given this only applies to 'internal' links and not inter switch physical links
I think you should rename it.  An eventual p2p link between physical switches
is going to be much more complex as will need routing information.
Let us avoid trampling on that space.

> +Users:
> +		Userspace applications interested in determining a optimal P2P
> +		link for data transfers between devices connected to the PCIe
> +		switches.

Need more data that 'there is a link' for this.
I'd like to see some info on bandwidth and latency. I've previously been
in discussions about similar devices that provide a low latency but low
bandwidth direct path.  That gets more likely if we scale up to
multiple physical switches or the software stack is choosing between
multiple p2p targets (e.g. getting nearest path to a multiheaded NIC).

Perhaps best bet is to leave space for that by using a directory
here to cover everything about p2p?  Maybe have links under there to the
other upstream ports? That might be fiddly to manage though.

> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 6af5e0425872..c940b4b242fd 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -18,6 +18,7 @@
>  #include <linux/string.h>
>  #include <linux/slab.h>
>  #include <linux/aer.h>
> +#include <linux/bitops.h>
>  
>  #include "../pci.h"
>  #include "portdrv.h"
> @@ -37,6 +38,134 @@ struct portdrv_service_data {
>  	u32 service;
>  };
>  
> +/**
> + * pcie_brcm_is_p2p_supported - Broadcom device specific handler
> + *       to check if the upstream port supports inter switch P2P.
> + *
> + * @dev: PCIe upstream port to check
> + *
> + * This function assumes the PCIe upstream port is a Broadcom
> + * PCIe device.
> + */
> +static bool pcie_brcm_is_p2p_supported(struct pci_dev *dev)

Put this in a separate c file + use a config option and some
stubs to make it go away if people don't want to support it.
The layering and separation in portdrv is currently messy but
we shouldn't make it worse :(

> +{
> +	u64 dsn;
> +	u16 vsec;
> +	u32 vsec_data;
> +
> +	dsn = pci_get_dsn(dev);
> +	if (!dsn) {
> +		pci_dbg(dev, "DSN capability is not present\n");
> +		return false;
> +	}

Why get the dsn (which will frequently exist on other devices)
before getting the vsec which won't?  Reorder these first
two checks. For most devices the match on vendor will fail in the
vsec check.

> +
> +	vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_LSI_LOGIC,
> +					PCIE_BRCM_SW_P2P_VSEC_ID);
> +	if (!vsec) {
> +		pci_dbg(dev, "Failed to get VSEC capability\n");
> +		return false;
> +	}
> +
> +	pci_read_config_dword(dev, vsec + PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET,
> +			      &vsec_data);
> +
> +	pci_dbg(dev, "Serial Number: 0x%llx VSEC 0x%x\n",
> +		dsn, vsec_data);
> +
> +	if (!PCIE_BRCM_SW_IS_SECURE_PART(dsn))

Add a comment on this. Not obvious what this is checking as it's picking
a single bit out of a serial number...

> +		return false;
> +
> +	if (FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) !=
> +		PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK)
> +		return false;
> +
> +	return true;
	return FIELD_GET(PCIE_BRCM_SW_P2P_MODE_MASK, vsec_data) ==
	       PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK;
perhaps.

> +}
> +
> +/**
> + * Determine if device supports Inter switch P2P links.
> + *
> + * Return value: true if inter switch P2P is supported, return false otherwise.
> + */
> +static bool pcie_port_is_p2p_supported(struct pci_dev *dev)
> +{
> +	/* P2P link attribute is supported on upstream ports only */
> +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> +		return false;
> +
> +	/*
> +	 * Currently Broadcom PEX switches are supported.
> +	 */
> +	if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC &&
> +	    (dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_HLC ||
> +	     dev->device == PCI_DEVICE_ID_BRCM_PEX_89000_LLC))
> +		return pcie_brcm_is_p2p_supported(dev);
> +
> +	return false;
> +}
> +
> +/*
> + * Traverse list of all PCI bridges and find devices that support Inter switch P2P
> + * and have the same serial number to create report the BDF over sysfs.
> + */
> +static ssize_t p2p_link_show(struct device *dev, struct device_attribute *attr,
> +			     char *buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev), *pdev_link = NULL;
> +	size_t len = 0;
> +	u64 dsn, dsn_link;
> +
> +	dsn = pci_get_dsn(pdev);

Maybe add a comment that we don't need to repeat checks that were done
to make the attribute visible. Hence dsn will exist and it will be p2p link capable.

> +
> +	/* Traverse list of PCI bridges to determine any available P2P links */
> +	while ((pdev_link = pci_get_class(PCI_CLASS_BRIDGE_PCI << 8, pdev_link))

Feels like we should have a for_each_pci_bridge() similar to for_each_pci_dev()
that does this, but that is already defined to mean something else...

Bjorn, is this something we should be looking to make more consistent
perhaps with naming to make it clear what is a search of all instances
on any bus and what is a search below a particular bus?

> +			!= NULL) {
> +		if (pdev_link == pdev)
> +			continue;
> +
> +		if (!pcie_port_is_p2p_supported(pdev_link))
> +			continue;
> +
> +		dsn_link = pci_get_dsn(pdev_link);
> +		if (!dsn_link)
> +			continue;
> +
> +		if (dsn == dsn_link)
> +			len += sysfs_emit_at(buf, len, "%04x:%02x:%02x.%d\n",
> +					     pci_domain_nr(pdev_link->bus),
> +					     pdev_link->bus->number, PCI_SLOT(pdev_link->devfn),
> +					     PCI_FUNC(pdev_link->devfn));
> +	}
> +
> +	return len;
> +}


> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 12c89ea0313b..1be06cb45665 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -25,6 +25,16 @@
>  
>  #define PCIE_PORT_DEVICE_MAXSERVICES   5
>  
> +/* P2P Link supported device IDs */
> +#define PCI_DEVICE_ID_BRCM_PEX_89000_HLC	0xC030
> +#define PCI_DEVICE_ID_BRCM_PEX_89000_LLC	0xC034
> +
> +#define PCIE_BRCM_SW_P2P_VSEC_ID		0x1
> +#define PCIE_BRCM_SW_P2P_MODE_VSEC_OFFSET	0xC
> +#define PCIE_BRCM_SW_P2P_MODE_MASK		GENMASK(9, 8)
> +#define PCIE_BRCM_SW_P2P_MODE_INTER_SW_LINK	0x2
> +#define PCIE_BRCM_SW_IS_SECURE_PART(dsn)	((dsn) & 0x8)
Define the mask, and use FIELD_GET() to get that.
> +
>  extern bool pcie_ports_dpc_native;
>  
>  #ifdef CONFIG_PCIEAER





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux