Re: [RFC PATCH 0/9] pci: portdrv: Add auxiliary bus and register CXL PMUs (and aer)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 5 Jun 2024 13:04:09 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> On Wed, May 29, 2024 at 05:40:54PM +0100, Jonathan Cameron wrote:
> > Focus of this RFC is CXL Performance Monitoring Units in CXL Root Ports +
> > Switch USP and DSPs.
> > 
> > The final patch moving AER to the aux bus is in the category of 'why
> > not try this' rather than something I feel is a particularly good idea.
> > I would like to get opinions on the various ways to avoid the double bus
> > situation introduced here. Some comments on other options for those existing
> > 'pci_express' bus devices at the end of this cover letter.
> > 
> > The primary problem this RFC tries to solve is providing a means to
> > register the CXL PMUs found in devices which will be bound to the
> > PCIe portdrv. The proposed solution is to avoid the limitations of
> > the existing pcie service driver approach by bolting on support
> > for devices registered on the auxiliary_bus.
> > 
> > Background
> > ==========
> > 
> > When the CXL PMU driver was added, only the instances found in CXL type 3
> > memory devices (end points) were supported. These PMUs also occur in CXL
> > root ports, and CXL switch upstream and downstream ports.  Adding
> > support for these was deliberately left for future work because theses
> > ports are all bound to the pcie portdrv via the appropriate PCI class
> > code.  Whilst some CXL support of functionality on RP and Switch Ports
> > is handled by the CXL port driver, that is not bound to the PCIe device,
> > is only instantiated as part of enumerating endpoints, and cannot use
> > interrupts because those are in control of the PCIe portdrv. A PMU with
> > out interrupts would be unfortunate so a different solution is needed.
> > 
> > Requirements
> > ============
> > 
> > - Register multiple CXL PMUs (CPMUs) per portdrv instance.  
> 
> What resources do CPMUs use?  BARs?  Config space registers in PCI
> Capabilities?  Interrupts?  Are any of these resources
> device-specific, or are they all defined by the CXL specs?

Uses the whole lot (there isn't a toy out there that the CXL
spec doesn't like to play with :). Discoverable via a CXL defined DVSEC
in config space. Registers are in a BAR (discovered from the DVSEC).
MSI/MSIX, number in a register in the BAR.

All the basic infrastructure and some performance event definitions
are in the CXL spec.  It's all discoverable. 

The events are even tagged with VID so a vendor can implement
someone else's definitions or those coming from another specification.
A bunch of CXL VID tagged ones exist for protocol events etc.

It's a really nice general spec.  If I were more of a dreamer and had
the time I'd try to get PCI-SIG to adopt it and we'd finally have
something generic for PCI performance counting.

> 
> The "device" is a nice way to package those up and manage ownership
> since there are often interdependencies.
> 
> I wish portdrv was directly implemented as part of the PCI core,
> without binding to the device as a "driver".  We handle some other
> pieces of functionality that way, e.g., the PCIe Capability itself,
> PM, VPD, MSI/MSI-X, 

Understood, though I would guess that even then there needs to
be a pci_driver binding to the class code to actually query all those
facilities and get interrupts registered etc.

> 
> > - portdrv binds to the PCIe class code for PCI_CLASS_BRIDGE_PCI_NORMAL which
> >   covers any PCI-Express port.
> > - PCI MSI / MSIX message based interrupts must be registered by one driver -
> >   in this case it's the portdrv.  
> 
> The fact that AER/DPC/hotplug/etc (the portdrv services) use MSI/MSI-X
> is a really annoying part of PCIe because bridges may have a BAR or
> two and are allowed to have device-specific endpoint-like behavior, so
> we may have to figure out how to share those interrupts between the
> PCIe-architected stuff and the device-specific stuff.  I guess that's
> messy no matter whether portdrv is its own separate driver or it's
> integrated into the core.

Yes, the interrupt stuff is the tricky bit.  I think whatever happens
we end up with a pci device driver that binds to the class code.
Maybe we have a way to switch in alternative drivers if that turns
out to be necessary.

Trying to actually manage the interrupts in the core (without a driver binding)
is really tricky.  Maybe we'll get there one day, but I don't think
it is the first target.  We should however make sure not to do anything
that would make that harder such as adding ABI in the wrong places.

> 
> > Side question.  Does anyone care if /sys/bus/pci_express goes away?
> > - in theory it's ABI, but in practice it provides very little
> >   actual ABI.  
> 
> I would *love* to get rid of /sys/bus/pci_express, but I don't know
> what, if anything, would break if we removed it.

Could try it and see who screams ;) or fake it via symlinks or similar
(under a suitable 'legacy' config variable that is on by default.)

How about I try the following steps:
0) An experiment to make sure I can fake /sys/bus/pci_express so it's
   at least hard to tell there aren't real 'devices' registered on that
   bus. Even if we decide not to do that long term, we need to know
   we can if a regressions report comes in.
   Worst comes to the worse we can register fake devices that fake
   drivers bind to that don't do anything beyond fill in sysfs.
1) Replace the bus/pci_express with direct functional calls in
   portdrv.  
   I'm thinking we have a few calls for each:
    bool aer_present(struct pci_dev *pci);
    aer_query_max_message_num() etc - used so we can do the msi/msix bit.
    aer_initialize()
    aer_finalize()
    aer_runtime_suspend() (where relevant for particular services)

   at the cost of a few less loops and a few more explicit calls
   that should simplify how things fit together.

2) Fix up anything in all that functionality that really cares
   that they are part of portdrv.  Not sure yet, but we may need
   a few additional things in struct pci_dev, or a to pass in some
   state storage in the ae_initialize() call or similar.
3) Consider moving that functionality to a more appropriate place.
   At this point we'll have services that can be used by other
   drivers - though I'm not yet sure how useful that will be.
4) End up with a small pci_driver that binds to pci devices with
   the appropriate class code. Should even be able to make it
   a module.
5) (maybe do this first) carry my auxiliary_bus + cpmu stuff
   and allow adopting other similar cases alongside the other
   parts.

Of course, no statements on 'when' I'll make progress if this seems
like a step forwards, but probably sometime this summer. Maybe sooner.

Jonathan

> 
> Bjorn





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux