Re: Archs using generic PCI controller drivers vs. resource policy

Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> · Wed, 03 Jul 2019 10:16:49 +1000

On Tue, 2019-07-02 at 15:19 -0500, Bjorn Helgaas wrote:
> On Sun, Jun 23, 2019 at 10:30:42AM +1000, Benjamin Herrenschmidt
> wrote:
> > Hi !
> > 
> > As part of my cleanup and consolidation of the PCI resource
> > assignment
> > policies, I need to clarify something.
> > 
> > At the moment, unless PCI_PROBE_ONLY is set, a number of
> > platforms/archs expect Linux to reassign everything rather than
> > honor
> > what has setup, then (re)assign what's left or broken.
> 
> I don't think "reassign everything" is any different from "honor what
> has been setup and (re)assign what's left or broken".

No it actually is. The policy on these is to rather explicitely ignore
what was set. If you just switch to honoring it, a good number of those
platforms will break. (We know that happens on arm64 as we are trying
to do just that).

Part of the problem is it's not always easy to figure out whether the
existing setup is "broken". It could just be "very suboptimal" for
example. Or broken in ways we don't detect early enough to do a good
job about it.

I really wouldn't try to change that at this point. Those platforms are
happy with Linux doing the complete setup the way it does, I don't see
areason to change it. In fact, in some cases we can do even better (see
below).

> "Reassign everything" is clearly allowed to produce the exact same
> result as "honor what has been setup and (re)assign what's left or
> broken".

Provided we have some IA that can figure out what "broken" means :)

> I claim any difference between the two is actually a fragile
> dependency on the Linux assignment algorithm that is likely to break
> as that algorithm changes.
> 
> Or am I missing something?

Well, reassign-everything isn't that likely to break when Linux changes
as long as linux doesn't change in ways that are completely busted
anyway. It's a pretty simple process really. And that would be caught
quickly since a LOT of platforms use that method.

As for the reasons, well, this tends to be embedded platforms where the
firmware may have left crap behind that we really don't want to honor
and can't really detect as "broken".

For example, switches sized incorrectly, or too big, sub optimal
placement with everything in 32-bit space and no room left for SR-IOV,
etc...

In fact, I'm thinking that on these platforms (which are basically
*all* arm32, all embedded ppc, and pretty much all other non-
server/desktop archs that aren't x86), we could in fact go one step
further and improve the default algorithm by applying the "distribute
available space" method at the top level (we currently only do it for
hotplug bridges below that, so the top level bridges will get the
default which, for hotplug, isn't likely to be enough).

I plan to toy around once I'm done with the first stage of
consolidation (which I put on the backburner for a little bit as I have
to deal with something else first, but I'll get back to it soon).

Cheers,
Ben.