Re: Multitude of resource assignment functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 24, 2019 at 10:45:17AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-06-24 3:13 a.m., Benjamin Herrenschmidt wrote:
> > So I'm staring at these three mostly at this point:
> > 
> > void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
> > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> > void pci_assign_unassigned_bus_resources(struct pci_bus *bus)
> > 
> > Now we have 3 functions that fundamentally have the same purpose,
> > assign what was left unassigned down a PCI hierarchy, but are going
> > about it in quite a different manner.
> > 
> > Now to make things worse, there's little consistency in which one gets
> > called where. We have PCI controllers calling the first one sometimes,
> > the last one sometimes, or doing the manual:
> > 
> > 	pci_bus_size_bridges(bus);
> > 	pci_bus_assign_resources(bus);
> > 
> > Or variants with pci_bus_size_bridges sometimes missing etc...
> 
> I suspect there isn't much rhyme or reason to it. None of this is well
> documented so developers writing the controller drivers probably didn't
> have a good idea of what the correct thing to do was, and just stuck
> with the first thing that worked.
> 
> > Now I've consolidated a lot of that and removed all of those "manual"
> > cases in my work-in-progress branch, but I'd like to clarify and
> > possibly remove the 3 ones above.
> > 
> > Let's start with the last one, pci_assign_unassigned_bus_resources, as
> > it's the easiest to remove from users in drivers/pci/controller/* (and
> > replace with pci_assign_unassigned_root_bus_resources typically).
> > 
> > This leaves it used in a couple of corner cases, most of them I think
> > I can kill, and .... sysfs 'rescan'.
> > 
> > The interesting thing about that function is that it tries to avoid
> > resizing the bridge of the bus passed as an argument, it will only
> > resize subordinate bridges. From the changelog it was created for
> > hotplug bridges, but almost none uses it (some powerpc stuff I can
> > probably kill) ... and sysfs rescan.
> > 
> > I wonder what's the remaining purpose of it. sysfs rescan could
> > probably be cleaned up to use the two first... Also why avoid resizing
> > the bridge itself ?
> > 
> > That leads to the difference between
> > pci_assign_unassigned_root_bus_resources()
> > and pci_assign_unassigned_bridge_resources().
> > 
> > The names are misleading. The former isn't just about the root bus
> > resources. It's about the entire tree underneath the root bus.
> > 
> > The main difference that I can tell are:
> > 
> >  - pci_assign_unassigned_root_bus_resources() may or may not try to
> > realloc, depending on a combination of command line args, config
> > option, presence of IOV devices etc... while
> > pci_assign_unassigned_bridge_resources() always will
> > 
> >  - pci_assign_unassigned_bridge_resources() will call
> > pci_bridge_distribute_available_resources() to distribute resource to
> > child hotplug bridges, while pci_assign_unassigned_root_bus_resources()
> > won't.
> >
> > Now, are we 100% confident we want to keep those discrepancies ?
> > 
> > It feels like the former function is intended for boot time resource
> > allocation, and the latter for hotplug, but I can't make sense of why
> > the resources of a device behind a hotplug bridge should be allocated
> > differently depending on whether that device was plugged at boot or
> > plugged later.
> 
> I don't really know, but I kind of assumed reallocing any time but early
> in boot would be dangerous. It involves un-assigning a bunch of
> resources without any real check to see if a driver is using them or
> not. If they were being used by a driver (which is typical) and they
> were reassigned, everything would break.
> 
> I mean, in theory the code could/should be the same for both paths and
> it could just make a single, better decision on whether to realloc or
> not. But that's going to be challenging to get there.
> 
> > Also why not distribute available resources at boot between top level
> > hotplug bridges ?
> >
> > I'm not even going into the question of why the resource
> > sizing/assignment code is so obscure/cryptic/incomprehensible, that's
> > another kettle of fish, but I'd like to at least clarify the usage
> > patterns a bit better.
> I got the impression the code was designed to generally let the firmware
> set things up -- it just fixed things up if the firmware messed it up
> somehow. My guess would be it evolved out of a bunch of hacks designed
> to fix broken bioses into something new platforms used to do full
> enumeration (because it happened to work).
Unfortunately, the operating system is designed to let the firmware do 
things. In my mind, ACPI should not need to exist, and the operating 
system should start with a clean state with PCI and re-enumerate 
everything at boot time. The PCI allocation is so broken and 
inconsistent (as you have noted) because it tries to combine the two, 
when firmware enumeration and native enumeration should be mutually 
exclusive. I have attempted to re-write large chunks of probe.c, pci.c 
and setup-bus.c to completely disregard firmware enumeration and clean 
everything up. Unfortunately, I get stuck in probe.c with the double 
recursive loop which assigns bus numbers - I cannot figure out how to 
re-write it successfully. Plus, I feel like nobody will be ready for 
such a drastic change - I am having trouble selling minor changes that 
fix actual use cases, as opposed to code reworking.

My next proposal might be a kernel parameter for PCI to set various 
levels of disregard for firmware, from none to complete, which can be 
added to incrementally to do more and more (rather than all in one patch 
series). This can supercede pci=realloc. The realloc command is so 
broken because once the system has loaded drivers, it becomes next to 
impossible to free and reallocate a resource to fit another device in - 
because it will upset existing devices. The realloc command is only 
useful in early boot because nothing is yet assigned, so it works. 
However, the same effect can be achieved by releasing all the resources 
on the root port before anything happens. I think it was 
pci_assign_unassigned_resources(), and I did verify this experimentally. 
This switch could be part of such a new kernel parameter to ignore 
firmware influence on PCI.

I hope that somehow we can transition to ignoring the firmware - because 
firmware and native enumeration need to be mutually exclusive, and we 
need native enumeration for PCI hotplug. If anybody has any ideas how, I 
would love to hear.

Nicholas

> 
> Logan



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux