Focus of this RFC is CXL Performance Monitoring Units in CXL Root Ports + Switch USP and DSPs. The final patch moving AER to the aux bus is in the category of 'why not try this' rather than something I feel is a particularly good idea. I would like to get opinions on the various ways to avoid the double bus situation introduced here. Some comments on other options for those existing 'pci_express' bus devices at the end of this cover letter. The primary problem this RFC tries to solve is providing a means to register the CXL PMUs found in devices which will be bound to the PCIe portdrv. The proposed solution is to avoid the limitations of the existing pcie service driver approach by bolting on support for devices registered on the auxiliary_bus. Background ========== When the CXL PMU driver was added, only the instances found in CXL type 3 memory devices (end points) were supported. These PMUs also occur in CXL root ports, and CXL switch upstream and downstream ports. Adding support for these was deliberately left for future work because theses ports are all bound to the pcie portdrv via the appropriate PCI class code. Whilst some CXL support of functionality on RP and Switch Ports is handled by the CXL port driver, that is not bound to the PCIe device, is only instantiated as part of enumerating endpoints, and cannot use interrupts because those are in control of the PCIe portdrv. A PMU with out interrupts would be unfortunate so a different solution is needed. Requirements ============ - Register multiple CXL PMUs (CPMUs) per portdrv instance. - portdrv binds to the PCIe class code for PCI_CLASS_BRIDGE_PCI_NORMAL which covers any PCI-Express port. - PCI MSI / MSIX message based interrupts must be registered by one driver - in this case it's the portdrv. - portdrv goes through a dance to discover the minimal number of irq vectors required, and as part of this it needs to find out the highest vector number used. - CXL PMUs store the interrupt message number (the one needed to find the above number of vectors registered) in a register in a BAR. That BAR + offset of the register block is discovered from the CXL Register Locator DVSEC. As such finding the maximum interrupt message number used by any CPMU requires checking all CXL PMU register blocks in the CXL Register Locator DVSEC, briefly mapping their register space and checking that one register. This is safe to do because the rest of the CXL stack doesn't map this section of the BAR and the CXL PMU device is not added until after we have unmapped it again. Dependency fun. - The existing CXL_PMU driver registers the device on the CXL_BUS. - portdrv is not modular. - CONFIG_CXL_BUS is potentially modular (i.e. not available until the core CXL module is loaded). - The portdrv can't be dependent on CONFIG_CXL_BUS directly without forcing CXL to be built in. There are various ways to solve this dependency issue an meet the above requirements. 1. Add an intermediate glue device that has a driver that can depend on CONFIG_CXL_BUS and hence ensure that is loaded before doing a full enumeration of the CXL PMU instances and registering appropriate devices. To allow for simple tear down, device managed interfaces are used against the platform driver, so that when it goes away it removes the CPMU device instances cleanly. The parents are set one level higher however, so that path can be used to get to the underlying PCI Device (which can't go away without tearing everything down as it's the grandparent device). On CXL BUS _____________ __ ________________ __ ________ __________ | Port | | creates | | | creates | | | | PCI Dev | |--------->| PCIE CPMU GLUE | |-------->| CPMU A | | |_____________| | | Platform Device| | |________| | |pordrv binds | |________________| | | perf/cxlpmu binds | |________________| |cxlpmu_glue binds | |___________________| | enumerates CPMUs | ________ __________ Deps /|___________________|-------->| | | / | CPMU B | | cxl_core.ko |________| | / providing CXL_BUS | perf/cxlpmu binds | Deps/ |___________________| _____________ __ / | Type 3 | | creates ________ _________ | PCI Dev | |--------------------------------------->| | | |_____________| | | CPMU C | | | cxl_pci binds | |________| | |________________| | perf/cxpmu binds | |__________________| 2. Make the CXL_PMU driver handle multiple buses. This will look similar to a sensor driver with I2C and SPI drivers but in this case register as a driver for devices on the CXL_BUS and one for another 'new' bus (e.g. the auxiliary bus which exists to enable this sort of hybrid driver) Register auxiliary devices directly from portdrv. This has to be done in multiple steps so enough interrupt vectors are allocated before the CPMU device is added and that driver might probe. On auxiliary bus, children of the portdrv. _____________ __ ________ __________ | Port | | creates | | | | PCI Dev | |--------->| CPMU A | | |_____________| | |________| | |pordrv binds | | perf/cxlpmu binds | |________________| |___________________| \ \ \ ________ __________ \ | | | ------>| CPMU A | | |________| | | perf/cxlpmu binds | |___________________| AND cxl_core.ko / providing CXL_BUS Deps/ On CXL BUS _____________ __ / | Type 3 | | creates ________ _________ | PCI Dev | |--------------------------------------->| | | |_____________| | | CPMU C | | | cxl_pci binds | |________| | |________________| | perf/cxpmu binds | |__________________| I have code for both and on balance option 2 is cleaner as no magic glue devices are registered so that's what is presented in this RFC AER experiment ============== Note that this discussion oddly enough doesn't apply to the the CXL PMU because we need a bus structure of some type to solve the registration dependency problems. The existing portdrv subdrivers (or at least those I've looked at in detail) are much simpler. Having two different types of bus under a portdrv instance is far from ideal. Given I'm adding an auxbus and I wanted to poke some of the corners + I have suitable CXL error injection support in QEMU that lets me exercise the AER flows I decided to play a bit with that. Note that there is some power management support in this patch set and I hacked in callbacks to the AER driver to test that but it is infrastructure that AER itself doesn't need. In previous discussions various approaches have been proposed. 1) Auxiliary bus as done here. It works, but is messy and there is realtively little point if the AER driver is always forced to be built in and load immediately anyway. Is there any interest in making the service drivers modular? 2) Turning the service drivers into straight forward library function calls. This basically means providing some storage for their state and calling the bits of probe and remove not related to the struct device (which will no longer exist) directly. I'm happy to look at either, but keen that if possible we don't gate the CPMU support on that. There have been discusisons around the need for vendor specific telemetry solutions on CXL switches and to try and encourage the upstream approach, I'd like to ensure we have good support the facilities in the CXL specification. Side question. Does anyone care if /sys/bus/pci_express goes away? - in theory it's ABI, but in practice it provides very little actual ABI. If we do need to maintain that, refactoring this code gets much harder and I suspect our only way forwards is a cut and paste version of the auxiliary_bus that gives the flexibility needed but would still provide that /sys/bus/pci_express. Not nice if we can avoid it! Other misc questions / comments. - Can we duplicate the irq discovery into the auxiliary_bus drivers? They do need to be discovered in pordrv to work out how many vectors to request, but the current multipass approach of (1. Discovery, 2. IRQ pass one, 3. Store the IRQs) means auxiliary devices can only be created in an additional pass 4 unless we don't pass them the irqs and allow the child device drivers to query it directly when they load. Note this also avoids the theoretical possiblity that they move when portdrv reduces the requested vectors (to avoid waste). - Additional callbacks for auxiliary_driver needed for runtime pm. For now I've just done suspend and resume, but we need the noirq variant and runtime_suspend / runtime_resume callbacks to support all the existing pcie_driver handled subdevices. - Worth moving any of the drivers that register in parallel with the pcie portdrv over to this new approach (the Designware PMU for example?) Jonathan Cameron (9): pci: pcie: Drop priv_data from struct pcie_device and use dev_get/set_drvdata() instead. pci: portdrv: Drop driver field for port type. pci: pcie: portdrv: Use managed device handling to simplify error and remove flows. auxiliary_bus: expose auxiliary_bus_type pci: pcie: portdrv: Add a auxiliary_bus cxl: Move CPMU register definitions to header pci: pcie/cxl: Register an auxiliary device for each CPMU instance perf: cxl: Make the cpmu driver also work with auxiliary_devices pci: pcie: portdrv: aer: Switch to auxiliary_bus drivers/base/auxiliary.c | 2 +- drivers/cxl/pmu.h | 54 +++++++ drivers/pci/hotplug/pciehp_core.c | 13 +- drivers/pci/hotplug/pciehp_hpc.c | 2 +- drivers/pci/pci-driver.c | 4 - drivers/pci/pcie/Kconfig | 10 ++ drivers/pci/pcie/Makefile | 1 + drivers/pci/pcie/aer.c | 68 +++++--- drivers/pci/pcie/cxlpmu.c | 129 +++++++++++++++ drivers/pci/pcie/dpc.c | 1 - drivers/pci/pcie/pme.c | 14 +- drivers/pci/pcie/portdrv.c | 254 +++++++++++++++++++++++------- drivers/pci/pcie/portdrv.h | 37 +++-- drivers/perf/Kconfig | 1 + drivers/perf/cxl_pmu.c | 153 ++++++++++-------- include/linux/auxiliary_bus.h | 1 + 16 files changed, 558 insertions(+), 186 deletions(-) create mode 100644 drivers/pci/pcie/cxlpmu.c -- 2.39.2