arm64 PCI resource allocation issue

Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> · Tue, 02 Aug 2022 14:07:00 +1000

Hi Folks !

It's been a while ... (please extend the CC list as needed)

I'd like to re-open an ancient issue because it's still biting us
(AWS).

A few years back, I updated the PCIe resource allocation code to be a
bit more in line with what other architectures do. That said, once
thing we couldn't agree on was to do like x86 and default to preserving
the firmware provided resources by default.

On x86, the kernel "allocates" (claims) the resources (unless it finds
something obviously wrong), then allocates anything left unallocated.

On arm64, we use to just re-allocate everything. I changed this to
first use some more common code for doing all that, but also to have
the option to claim existing resources if _DSM tells us to preserve
them for a given host brigde.

I still think this is the wrong way to go and that we should preserve
the UEFI resources by default unless told not to :-)

The case back then was that there existed some (how many ? there was
one real example if I remember correctly) bogus firwmares that came out
of UEFI with too small windows. We could just quirk those ....

The reason I'm bringing this back is that re-allocating resources for
system devices cause problems.

The most obvious one today that is affecting EC2 instances is that the
UART address specified in SPCR is no longer valid, causing issues
ranging from the console not working to MMIO to what becomes "random
addresses". Typically today this is "worked around" by using
console=ttyS0 to force selection of the first detected PCI UART,
because the match against SPCR is based on address and it won't match,
but there's always the underlying risk that things like earlycon starts
poking at now-incorrect addresses until 8250 takes over.

This is the most obvious problem. Any other "system" device that
happens to be PCIe based (anything detected early, via device-tree,
ACPI or otherwise) is at risk of a similar issue. On x86 that could be
catastrophic because near everything looks like a PCI device, on arm64
we seem to have been getting away with it a bit more easily ... so far.

The alternative here would be to use ad-hock kludges for such system
devices, to "register" the addresses early, and have some kind of hook
in the PCI code that keeps track of them as they get remapped.

If we want this, I would propose (happy to provide the implementation
but let's discuss the design first) something along the line of a
generic mechanism to "register" such a system device, which would add
it to a list. That list would be scanned on PCI device discovery for
BAR address matches, and the pci_dev/BAR# added to the entry (that or
put a pointer to the entry into pci_dev for speed/efficiency).

The difficulty is how is that update propagated:

This is of course fiddly. For example, the serial info is passed via
two different ways, one being earlycon (and probably the easiest to
track), the other one an ASCII string passed to
add_preferred_console(), which would require more significant hackery
(the code dealing with console mathing is a gothic horror).

Also if such a system device is in continuous use during the boot
process (UART ?) it needs to be "updated" as soon as possible after the
BARs (and parent BARs) have been also updated (in fact this is
generally why PCI debug dies horribly when using PCI based UARTs).
Maybe an (optional) callback that earlycon can add ?

Additionally, this would only work if such system devices are
"registered" before they get remapped... 

Another approach would be to have pci_dev keep a copy of the original
resources (at least for the primary BARs) and provide an accessor for
use by things like earlycon or 8250 to compare against these, though
that doesn't solve the problem of promptly "updating" drivers for
system devices.

Opinions ?

Cheers,
Ben.