On Tue, 16 May 2023 14:17:52 -0500 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > On Tue, May 16, 2023 at 04:03:04PM +0100, Jonathan Cameron wrote: > > > > PCI folks, Question below directed at you. Please take a look. > > +CC linux-cxl because a similar question is going to bite us shortly > > if we want CXL PMUs to work well on RP or Switch ports. > > > > > >> +static int dwc_pcie_ras_des_discover(struct dwc_pcie_pmu_priv *priv) > > > >> +{ > > > >> + int index = 0; > > > >> + struct pci_dev *pdev = NULL; > > > >> + struct dwc_pcie_rp_info *rp_info; > > > >> + > > > >> + INIT_LIST_HEAD(&priv->rp_infos); > > > >> + > > > >> + /* Match the rootport with VSEC_RAS_DES_ID */ > > > >> + for_each_pci_dev(pdev) { > > > > > > > > Does the PCI layer not offer a more robust mechanism for this? > > > > (PCI fixups come to mind, but I don't actually know whether that > > > > would be a viable approach or not.) > > > > > > I am afraid not yet. Jonathan try to add a PMU service but it is > > > not merged into mainline. > > > > I wouldn't read much into that 'failure'. We never persisted with > > that driver because it was for an old generation of hardware. > > Mostly the aim with that was to explore the area of PCIe PMU in > > general rather than to get the support upstream. Some of the > > counters on that hardware were too small to be of much use anyway :) > > > > Grabbing just relevant functions.. > > > > Bjorn, we need to figure out a way forwards for this sort of case > > and I'd appreciate your input on the broad brush question of 'how > > should it be done'? > > > > This is a case where a PCIe port (RP here) correctly has the PCIe > > class code so binds to the pcie_port driver, but has a VSEC (others > > examples use DOE, or DVSEC) that provides extended functionality. > > The referred to PCIe PMU from our older Hisilicon platforms did it > > by adding another service driver - that probably doesn't extend > > well. > > > > The approach used here is to separately walk the PCI topology and > > register the devices. It can 'maybe' get away with that because no > > interrupts and I assume resets have no nasty impacts on it because > > the device is fairly simple. In general that's not going to work. > > CXL does a similar trick (which I don't much like, but too late > > now), but we've also run into the problem of how to get interrupts > > if not the main driver. > > Yes, this is a real problem. I think the "walk all PCI devices > looking for one we like" approach is terrible because it breaks a lot > of driver model assumptions (no device ID to autoload module via udev, > hotplug doesn't work, etc), but we don't have a good alternative right > now. > > I think portdrv is slightly better because at least it claims the > device in the usual way and gives a way for service drivers to > register with it. But I don't really like that either because it > created a new weird /sys/bus/pci_express hierarchy full of these > sub-devices that aren't really devices, and it doesn't solve the > module load and hotplug issues. > > I would like to have portdrv be completely built into the PCI core and > not claim Root Ports or Switch Ports. Then those devices would be > available via the usual driver model for driver loading and binding > and for hotplug. Let me see if I understand this correctly as I can think of a few options that perhaps are inline with what you are thinking. 1) All the portdrv stuff converted to normal PCI core helper functions that a driver bound to the struct pci_dev can use. 2) Driver core itself provides a bunch of extra devices alongside the struct pci_dev one to which additional drivers can bind? - so kind of portdrv handling, but squashed into the PCI device topology? 3) Have portdrv operated under the hood, so all the services etc that it provides don't require a driver to be bound at all. Then allow usual VID/DID based driver binding. If 1 - we are going to run into class device restrictions and that will just move where we have to handle the potential vendor specific parts. We probably don't want that to be a hydra with all the functionality and lookups etc driven from there, so do we end up with sub devices of that new PCI port driver with a discover method based on either vsec + VID or DVSEC with devices created under the main pci_dev. That would have to include nastiness around interrupt discovery for those sub devices. So ends up roughly like port_drv. I don't think 2 solves anything. For 3 - interrupts and ownership of facilities is going to be tricky as initially those need to be owned by the PCI core (no device driver bound) and then I guess handed off to the driver once it shows up? Maybe that driver should call a pci_claim_port() that gives it control of everything and pci_release_port() that hands it all back to the core. That seems racey. As another similar proposal to 3 (and one Greg KH will hate :) can we do something similar to vfio and allow an unbind of a class driver followed by a bind of a more specific one? I think 1 is probably the easiest to implement, but it just moves the problem. If we had a way to reliably override the class driver if a more specific one exists, that might work around the problem but I don't think we can do that currently. Jonathan > > Bjorn