Re: Multitude of resource assignment functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 27, 2019 at 10:35:12AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-06-27 1:40 a.m., Nicholas Johnson wrote:
> > On Mon, Jun 24, 2019 at 10:45:17AM -0600, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2019-06-24 3:13 a.m., Benjamin Herrenschmidt wrote:
> >>> So I'm staring at these three mostly at this point:
> >>>
> >>> void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
> >>> void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> >>> void pci_assign_unassigned_bus_resources(struct pci_bus *bus)
> >>>
> >>> Now we have 3 functions that fundamentally have the same purpose,
> >>> assign what was left unassigned down a PCI hierarchy, but are going
> >>> about it in quite a different manner.
> >>>
> >>> Now to make things worse, there's little consistency in which one gets
> >>> called where. We have PCI controllers calling the first one sometimes,
> >>> the last one sometimes, or doing the manual:
> >>>
> >>> 	pci_bus_size_bridges(bus);
> >>> 	pci_bus_assign_resources(bus);
> >>>
> >>> Or variants with pci_bus_size_bridges sometimes missing etc...
> >>
> >> I suspect there isn't much rhyme or reason to it. None of this is well
> >> documented so developers writing the controller drivers probably didn't
> >> have a good idea of what the correct thing to do was, and just stuck
> >> with the first thing that worked.
> >>
> >>> Now I've consolidated a lot of that and removed all of those "manual"
> >>> cases in my work-in-progress branch, but I'd like to clarify and
> >>> possibly remove the 3 ones above.
> >>>
> >>> Let's start with the last one, pci_assign_unassigned_bus_resources, as
> >>> it's the easiest to remove from users in drivers/pci/controller/* (and
> >>> replace with pci_assign_unassigned_root_bus_resources typically).
> >>>
> >>> This leaves it used in a couple of corner cases, most of them I think
> >>> I can kill, and .... sysfs 'rescan'.
> >>>
> >>> The interesting thing about that function is that it tries to avoid
> >>> resizing the bridge of the bus passed as an argument, it will only
> >>> resize subordinate bridges. From the changelog it was created for
> >>> hotplug bridges, but almost none uses it (some powerpc stuff I can
> >>> probably kill) ... and sysfs rescan.
> >>>
> >>> I wonder what's the remaining purpose of it. sysfs rescan could
> >>> probably be cleaned up to use the two first... Also why avoid resizing
> >>> the bridge itself ?
> >>>
> >>> That leads to the difference between
> >>> pci_assign_unassigned_root_bus_resources()
> >>> and pci_assign_unassigned_bridge_resources().
> >>>
> >>> The names are misleading. The former isn't just about the root bus
> >>> resources. It's about the entire tree underneath the root bus.
> >>>
> >>> The main difference that I can tell are:
> >>>
> >>>  - pci_assign_unassigned_root_bus_resources() may or may not try to
> >>> realloc, depending on a combination of command line args, config
> >>> option, presence of IOV devices etc... while
> >>> pci_assign_unassigned_bridge_resources() always will
> >>>
> >>>  - pci_assign_unassigned_bridge_resources() will call
> >>> pci_bridge_distribute_available_resources() to distribute resource to
> >>> child hotplug bridges, while pci_assign_unassigned_root_bus_resources()
> >>> won't.
> >>>
> >>> Now, are we 100% confident we want to keep those discrepancies ?
> >>>
> >>> It feels like the former function is intended for boot time resource
> >>> allocation, and the latter for hotplug, but I can't make sense of why
> >>> the resources of a device behind a hotplug bridge should be allocated
> >>> differently depending on whether that device was plugged at boot or
> >>> plugged later.
> >>
> >> I don't really know, but I kind of assumed reallocing any time but early
> >> in boot would be dangerous. It involves un-assigning a bunch of
> >> resources without any real check to see if a driver is using them or
> >> not. If they were being used by a driver (which is typical) and they
> >> were reassigned, everything would break.
> >>
> >> I mean, in theory the code could/should be the same for both paths and
> >> it could just make a single, better decision on whether to realloc or
> >> not. But that's going to be challenging to get there.
> >>
> >>> Also why not distribute available resources at boot between top level
> >>> hotplug bridges ?
> >>>
> >>> I'm not even going into the question of why the resource
> >>> sizing/assignment code is so obscure/cryptic/incomprehensible, that's
> >>> another kettle of fish, but I'd like to at least clarify the usage
> >>> patterns a bit better.
> >> I got the impression the code was designed to generally let the firmware
> >> set things up -- it just fixed things up if the firmware messed it up
> >> somehow. My guess would be it evolved out of a bunch of hacks designed
> >> to fix broken bioses into something new platforms used to do full
> >> enumeration (because it happened to work).
> > Unfortunately, the operating system is designed to let the firmware do 
> > things. In my mind, ACPI should not need to exist, and the operating 
> > system should start with a clean state with PCI and re-enumerate 
> > everything at boot time. The PCI allocation is so broken and 
> > inconsistent (as you have noted) because it tries to combine the two, 
> > when firmware enumeration and native enumeration should be mutually 
> > exclusive. I have attempted to re-write large chunks of probe.c, pci.c 
> > and setup-bus.c to completely disregard firmware enumeration and clean 
> > everything up. Unfortunately, I get stuck in probe.c with the double 
> > recursive loop which assigns bus numbers - I cannot figure out how to 
> > re-write it successfully. Plus, I feel like nobody will be ready for 
> > such a drastic change - I am having trouble selling minor changes that 
> > fix actual use cases, as opposed to code reworking.
> 
> My worry would be if the firmware depends on any of those PCI resources
> for any of it's calls. For example, laptop firmware often has specific
> code for screen blanking/dimming when the special buttons are pressed.
> If it implements this by communicating with a PCI device then the kernel
> will break things by reassigning all the addresses.
> 
> However, having a kernel parameter to ignore the firmware choices might
> be a good way for us to start testing whether this is a problem or not
> on some systems
> 
> Logan
If Bjorn also agrees then I will give it a shot when I have finished 
with Thunderbolt.

Some other related thoughts:

- Should pci=noacpi imply pci=nocrs? It does not appear to, and I feel 
like it should, as CRS is part of ACPI and relates to PCI.

- Does anybody know why with pci=noacpi, you get dmesg warnings about 
cannot find PCI int A mapping - but they do not seem to cause the 
devices any issues in functioning? Is it because they are using MSI?

- Does pci=ignorefw sound good for a future proposal?

- Modern arches could give this option by default if they want 
everything done by the OS. Although this would not be nearly as nice as 
a code overhaul or branching out pci into pci-old and pci-new.

- Thunderbolt has given me a glimmer of hope. It used to be so tightly 
integrated into the system firmware and add-in cards were not even 
detectable without it (you need to hit up the pcie2tbt mailbox in the 
BIOS to wake the controller up, for a start). It would not even show 
without ACPI running. Now I can use pci=noacpi with this patch series 
and work happily with Thunderbolt.

- I have not given iommu or intel_iommu parameters but I am getting DMAR 
faults (probably because I am using pci=noacpi) but normally the DMAR 
does not come on if you do not ask it to. Is there perhaps something 
recently added to do with Thunderbolt that is activating it? I 
understand that regardless, DMAR does not work well without ACPI. The 
main two I care about are DMAR and MADT (multiple processors) tables and 
otherwise, I would disable ACPI altogether.

Cheers,
Nicholas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux