[RFC PATCH 0/9] pci: portdrv: Add auxiliary bus and register CXL PMUs (and aer)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Focus of this RFC is CXL Performance Monitoring Units in CXL Root Ports +
Switch USP and DSPs.

The final patch moving AER to the aux bus is in the category of 'why
not try this' rather than something I feel is a particularly good idea.
I would like to get opinions on the various ways to avoid the double bus
situation introduced here. Some comments on other options for those existing
'pci_express' bus devices at the end of this cover letter.

The primary problem this RFC tries to solve is providing a means to
register the CXL PMUs found in devices which will be bound to the
PCIe portdrv. The proposed solution is to avoid the limitations of
the existing pcie service driver approach by bolting on support
for devices registered on the auxiliary_bus.

Background
==========

When the CXL PMU driver was added, only the instances found in CXL type 3
memory devices (end points) were supported. These PMUs also occur in CXL
root ports, and CXL switch upstream and downstream ports.  Adding
support for these was deliberately left for future work because theses
ports are all bound to the pcie portdrv via the appropriate PCI class
code.  Whilst some CXL support of functionality on RP and Switch Ports
is handled by the CXL port driver, that is not bound to the PCIe device,
is only instantiated as part of enumerating endpoints, and cannot use
interrupts because those are in control of the PCIe portdrv. A PMU with
out interrupts would be unfortunate so a different solution is needed.

Requirements
============

- Register multiple CXL PMUs (CPMUs) per portdrv instance.
- portdrv binds to the PCIe class code for PCI_CLASS_BRIDGE_PCI_NORMAL which
  covers any PCI-Express port.
- PCI MSI / MSIX message based interrupts must be registered by one driver -
  in this case it's the portdrv.
- portdrv goes through a dance to discover the minimal number of irq vectors
  required, and as part of this it needs to find out the highest vector number
  used.
- CXL PMUs store the interrupt message number (the one needed to find the
  above number of vectors registered) in a register in a BAR.  That
  BAR + offset of the register block is discovered from the CXL
  Register Locator DVSEC.  As such finding the maximum interrupt message
  number used by any CPMU requires checking all CXL PMU register blocks
  in the CXL Register Locator DVSEC, briefly mapping their register space
  and checking that one register.  This is safe to do because the rest
  of the CXL stack doesn't map this section of the BAR and the CXL PMU
  device is not added until after we have unmapped it again.

Dependency fun.
- The existing CXL_PMU driver registers the device on the CXL_BUS.
- portdrv is not modular.
- CONFIG_CXL_BUS is potentially modular (i.e. not available until the
  core CXL module is loaded).
- The portdrv can't be dependent on CONFIG_CXL_BUS directly without
  forcing CXL to be built in.

There are various ways to solve this dependency issue an meet the above
requirements.

1. Add an intermediate glue device that has a driver that can depend on
   CONFIG_CXL_BUS and hence ensure that is loaded before doing a full
   enumeration of the CXL PMU instances and registering  appropriate
   devices. To allow for simple tear down, device managed interfaces are
   used against the platform driver, so that when it goes away it removes
   the CPMU device instances cleanly. The parents are set one level higher
   however, so that path can be used to get to the underlying PCI Device
   (which can't go away without tearing everything down as it's the
    grandparent device).

                                                             On CXL BUS    
 _____________ __            ________________ __           ________ __________
|  Port       |  | creates  |                |  | creates |        |          |
|  PCI Dev    |  |--------->| PCIE CPMU GLUE |  |-------->| CPMU A |          |
|_____________|  |          | Platform Device|  |         |________|          |
|pordrv binds    |          |________________|  |         | perf/cxlpmu binds |
|________________|          |cxlpmu_glue binds  |         |___________________|
                            | enumerates CPMUs  |          ________ __________
                      Deps /|___________________|-------->|        |          |
                          /                               | CPMU B |          |
                    cxl_core.ko                           |________|          |
		    /  providing CXL_BUS                  | perf/cxlpmu binds | 
               Deps/                                      |___________________|
 _____________ __ /
|  Type 3     |  | creates                                 ________ _________
|  PCI Dev    |  |--------------------------------------->|        |         |
|_____________|  |                                        | CPMU C |         |
| cxl_pci binds  |                                        |________|         |
|________________|                                        | perf/cxpmu binds |
                                                          |__________________|

2. Make the CXL_PMU driver handle multiple buses. This will look similar
   to a sensor driver with I2C and SPI drivers but in this case register as a
   driver for devices on the CXL_BUS and one for another 'new' bus (e.g.
   the auxiliary bus which exists to enable this sort of hybrid driver)
   Register auxiliary devices directly from portdrv. This has to be done
   in multiple steps so enough interrupt vectors are allocated before the
   CPMU device is added and that driver might probe.

                                On auxiliary bus, children of the portdrv.
 _____________ __            ________ __________
|  Port       |  | creates  |        |          |
|  PCI Dev    |  |--------->| CPMU A |          |
|_____________|  |          |________|          |
|pordrv binds    |          | perf/cxlpmu binds |
|________________|          |___________________|
                 \         
                  \
                   \         ________ __________
                    \       |        |          |
                     ------>| CPMU A |          |    
                            |________|          |
                            | perf/cxlpmu binds |
                            |___________________|
 AND
 
                     cxl_core.ko           
		    /  providing CXL_BUS   
               Deps/                                         On CXL BUS
 _____________ __ /
|  Type 3     |  | creates                                 ________ _________
|  PCI Dev    |  |--------------------------------------->|        |         |
|_____________|  |                                        | CPMU C |         |
| cxl_pci binds  |                                        |________|         |
|________________|                                        | perf/cxpmu binds |
                                                          |__________________|                  

I have code for both and on balance option 2 is cleaner as no magic glue
devices are registered so that's what is presented in this RFC

AER experiment
==============

Note that this discussion oddly enough doesn't apply to the the CXL
PMU because we need a bus structure of some type to solve the
registration dependency problems.  The existing portdrv subdrivers
(or at least those I've looked at in detail) are much simpler.

Having two different types of bus under a portdrv instance is far
from ideal. Given I'm adding an auxbus and I wanted to poke some of
the corners + I have suitable CXL error injection support in QEMU
that lets me exercise the AER flows I decided to play a bit with that.
Note that there is some power management support in this patch set
and I hacked in callbacks to the AER driver to test that but it is
infrastructure that AER itself doesn't need.

In previous discussions various approaches have been proposed.

1) Auxiliary bus as done here.  It works, but is messy and there
   is realtively little point if the AER driver is always forced
   to be built in and load immediately anyway. Is there any interest
   in making the service drivers modular?

2) Turning the service drivers into straight forward library
   function calls.  This basically means providing some storage
   for their state and calling the bits of probe and remove
   not related to the struct device (which will no longer exist)
   directly.

I'm happy to look at either, but keen that if possible we don't
gate the CPMU support on that.  There have been discusisons around
the need for vendor specific telemetry solutions on CXL switches
and to try and encourage the upstream approach, I'd like to ensure
we have good support the facilities in the CXL specification.

Side question.  Does anyone care if /sys/bus/pci_express goes away?
- in theory it's ABI, but in practice it provides very little
  actual ABI.

If we do need to maintain that, refactoring this code gets much
harder and I suspect our only way forwards is a cut and paste
version of the auxiliary_bus that gives the flexibility needed
but would still provide that /sys/bus/pci_express.
Not nice if we can avoid it!

Other misc questions / comments.
- Can we duplicate the irq discovery into the auxiliary_bus drivers?
  They do need to be discovered in pordrv to work out how many
  vectors to request, but the current multipass approach of
  (1. Discovery, 2. IRQ pass one, 3. Store the IRQs) means
  auxiliary devices can only be created in an additional pass 4
  unless we don't pass them the irqs and allow the child device
  drivers to query it directly when they load.  Note this also
  avoids the theoretical possiblity that they move when portdrv
  reduces the requested vectors (to avoid waste).
- Additional callbacks for auxiliary_driver needed for runtime pm.
  For now I've just done suspend and resume, but we need the
  noirq variant and runtime_suspend / runtime_resume callbacks
  to support all the existing pcie_driver handled subdevices.
- Worth moving any of the drivers that register in parallel
  with the pcie portdrv over to this new approach (the Designware
  PMU for example?)
  
Jonathan Cameron (9):
  pci: pcie: Drop priv_data from struct pcie_device and use
    dev_get/set_drvdata() instead.
  pci: portdrv: Drop driver field for port type.
  pci: pcie: portdrv: Use managed device handling to simplify error and
    remove flows.
  auxiliary_bus: expose auxiliary_bus_type
  pci: pcie: portdrv: Add a auxiliary_bus
  cxl: Move CPMU register definitions to header
  pci: pcie/cxl: Register an auxiliary device for each CPMU instance
  perf: cxl: Make the cpmu driver also work with auxiliary_devices
  pci: pcie: portdrv: aer: Switch to auxiliary_bus

 drivers/base/auxiliary.c          |   2 +-
 drivers/cxl/pmu.h                 |  54 +++++++
 drivers/pci/hotplug/pciehp_core.c |  13 +-
 drivers/pci/hotplug/pciehp_hpc.c  |   2 +-
 drivers/pci/pci-driver.c          |   4 -
 drivers/pci/pcie/Kconfig          |  10 ++
 drivers/pci/pcie/Makefile         |   1 +
 drivers/pci/pcie/aer.c            |  68 +++++---
 drivers/pci/pcie/cxlpmu.c         | 129 +++++++++++++++
 drivers/pci/pcie/dpc.c            |   1 -
 drivers/pci/pcie/pme.c            |  14 +-
 drivers/pci/pcie/portdrv.c        | 254 +++++++++++++++++++++++-------
 drivers/pci/pcie/portdrv.h        |  37 +++--
 drivers/perf/Kconfig              |   1 +
 drivers/perf/cxl_pmu.c            | 153 ++++++++++--------
 include/linux/auxiliary_bus.h     |   1 +
 16 files changed, 558 insertions(+), 186 deletions(-)
 create mode 100644 drivers/pci/pcie/cxlpmu.c

-- 
2.39.2





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux