Re: [RFC PATCH 0/9] pci: portdrv: Add auxiliary bus and register CXL PMUs (and aer)

Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> · Fri, 23 Aug 2024 12:05:01 +0100

On Thu, 6 Jun 2024 15:21:02 +0200
Lukas Wunner <lukas@xxxxxxxxx> wrote:

> On Thu, Jun 06, 2024 at 01:57:56PM +0100, Jonathan Cameron wrote:
> > Or are you thinking we can make something like the following work
> > (even when we can't do dynamic msix)?
> > 
> > Core bring up facilities and interrupt etc.  That includes bus master
> > so msi/msix can be issued (which makes me nervous - putting aside other
> > concerns as IIRC there are drivers where you have to be careful to
> > tweak registers to ensure you don't get a storm of traffic when you
> > hit bus master enable.
> > 
> > Specific driver may bind later - everything keeps running until 
> > the specific driver calls pci_alloc_irq_vectors(), then we have to disable all
> > interrupts for core managed services, change the number of vectors and
> > then move them to wherever they end up before starting them again.
> > We have to do the similar in the unwind path.  
> 
> My recollection is that Thomas Gleixner has brought up dynamic MSI-X
> allocation.  So if both the PCI core services as well as the driver
> use MSI-X, there's no problem.
> 
> For MSI, one approach might be to reserve a certain number of vectors
> in the core for later use by the driver.

So, my first attempt at doing things in the core ran into what I think
is probably a blocker. It's not entirely technical...

+CC Thomas who can probably very quickly confirm if my reasoning is
correct.

If we move these services into the PCI core itself (as opposed keeping
a PCI portdrv layer), then we need to bring up MSI for a device before
we bind a driver to that struct pci_dev / struct device.

If we follow through
pci_alloc_irq_vectors_affinity()
-> __pci_enable_msix_range() / __pci_enable_msi_range()
-> pci_setup_msi_context()
-> msi_setup_device_data()
-> devres_add()
//there is actually devres usage before this in msi_sysfs_create_group()
  but this is a shorter path to illustrate the point.

We will have registered a dev_res callback on the struct pci_dev
that we later want to be able to bind a driver to.  That driver
won't bind - because lifetimes of devres will be garbage
(see really_probe() for the specific check and resulting
"Resources present before probing")

So we in theory 'could' provide a non devres path through
the MSI code, but I'm going to guess that Thomas will not
be particularly keen on that to support this single use case.

Thomas, can you confirm if this is something you'd consider
or outright reject? Or is there an existing route to have 
MSI/X working pre driver bind and still be able to bind
the driver later.

Hence, subject to Thomas reply to above, I think we are back
to having a pci portdrv,

The options then become a case of tidying everything up and
supporting one of (or combination of).

1) Make all functionality currently handled by portdrv
   available in easily consumed form.  Handle 'extended' drivers
   by unbinding the class driver and binding another one (if
   the more specific driver happened not to bind) - we could
   make this consistent by checking for class matches first
   so we always land the generic driver if multiple are ready
   to go.  This may be a simple as an
   'devm_init_pcie_portdrv_standard(int mymaxmsgnum);' in probe
   of an extended driver.

2) Add lightweight static path to adding additional 'services'
   to portdrv.  Similar to current detection code to work out
   what message numbers are in going to be needed
3) Make portdrv support dynamic child drivers including increasing
   number of irq vectors if needed.
4) Do nothing at all and revisit next year (that 'worked' for
   last few years :)

Option 3 is most flexible but is it worth it?
- If we resize irq vectors, we will need to free all interrupts
  anyway and then reallocate them because they may move (maybe
  we can elide that if we are lucky for some devices).
- Will need to handle missed interrupts whilst they were disabled.
- Each 'child driver' will have to provide a 'detection' routine
  that will be called for all registered instances so that we can
  match against particular capabilities, stuff from BARs etc.
- Maybe we treat the 'normal' facilities from the PCI spec specially
  to avoid the interrupt enable/disable/enable dances except when
  there is something 'extra' present. That is more likely to hide
  odd bugs however.

I'm thinking option 3 is a case of over engineering something
better solved by 1 (or maybe 2).

There will be an LPC uconf session on this, but I'm going to
have some more time before LPC for exploration, so any inputs
now might help focus that effort. If not, see you in Vienna.

Jonathan