On Thu, 6 Jun 2024 15:21:02 +0200 Lukas Wunner <lukas@xxxxxxxxx> wrote: > On Thu, Jun 06, 2024 at 01:57:56PM +0100, Jonathan Cameron wrote: > > Or are you thinking we can make something like the following work > > (even when we can't do dynamic msix)? > > > > Core bring up facilities and interrupt etc. That includes bus master > > so msi/msix can be issued (which makes me nervous - putting aside other > > concerns as IIRC there are drivers where you have to be careful to > > tweak registers to ensure you don't get a storm of traffic when you > > hit bus master enable. > > > > Specific driver may bind later - everything keeps running until > > the specific driver calls pci_alloc_irq_vectors(), then we have to disable all > > interrupts for core managed services, change the number of vectors and > > then move them to wherever they end up before starting them again. > > We have to do the similar in the unwind path. > > My recollection is that Thomas Gleixner has brought up dynamic MSI-X > allocation. So if both the PCI core services as well as the driver > use MSI-X, there's no problem. > > For MSI, one approach might be to reserve a certain number of vectors > in the core for later use by the driver. So, my first attempt at doing things in the core ran into what I think is probably a blocker. It's not entirely technical... +CC Thomas who can probably very quickly confirm if my reasoning is correct. If we move these services into the PCI core itself (as opposed keeping a PCI portdrv layer), then we need to bring up MSI for a device before we bind a driver to that struct pci_dev / struct device. If we follow through pci_alloc_irq_vectors_affinity() -> __pci_enable_msix_range() / __pci_enable_msi_range() -> pci_setup_msi_context() -> msi_setup_device_data() -> devres_add() //there is actually devres usage before this in msi_sysfs_create_group() but this is a shorter path to illustrate the point. We will have registered a dev_res callback on the struct pci_dev that we later want to be able to bind a driver to. That driver won't bind - because lifetimes of devres will be garbage (see really_probe() for the specific check and resulting "Resources present before probing") So we in theory 'could' provide a non devres path through the MSI code, but I'm going to guess that Thomas will not be particularly keen on that to support this single use case. Thomas, can you confirm if this is something you'd consider or outright reject? Or is there an existing route to have MSI/X working pre driver bind and still be able to bind the driver later. Hence, subject to Thomas reply to above, I think we are back to having a pci portdrv, The options then become a case of tidying everything up and supporting one of (or combination of). 1) Make all functionality currently handled by portdrv available in easily consumed form. Handle 'extended' drivers by unbinding the class driver and binding another one (if the more specific driver happened not to bind) - we could make this consistent by checking for class matches first so we always land the generic driver if multiple are ready to go. This may be a simple as an 'devm_init_pcie_portdrv_standard(int mymaxmsgnum);' in probe of an extended driver. 2) Add lightweight static path to adding additional 'services' to portdrv. Similar to current detection code to work out what message numbers are in going to be needed 3) Make portdrv support dynamic child drivers including increasing number of irq vectors if needed. 4) Do nothing at all and revisit next year (that 'worked' for last few years :) Option 3 is most flexible but is it worth it? - If we resize irq vectors, we will need to free all interrupts anyway and then reallocate them because they may move (maybe we can elide that if we are lucky for some devices). - Will need to handle missed interrupts whilst they were disabled. - Each 'child driver' will have to provide a 'detection' routine that will be called for all registered instances so that we can match against particular capabilities, stuff from BARs etc. - Maybe we treat the 'normal' facilities from the PCI spec specially to avoid the interrupt enable/disable/enable dances except when there is something 'extra' present. That is more likely to hide odd bugs however. I'm thinking option 3 is a case of over engineering something better solved by 1 (or maybe 2). There will be an LPC uconf session on this, but I'm going to have some more time before LPC for exploration, so any inputs now might help focus that effort. If not, see you in Vienna. Jonathan