Re: [RFC] ARM64 PCI resource survey issue(s)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2019-06-10 at 11:11 +0100, Lorenzo Pieralisi wrote:
> 
> > I agree and I assume the problem stems from BIOSes creating either
> > invalid or incomplete assignments but as I said, I don't know for sure
> > as our platforms dont have that problem.
> 
> The first thing that I would like to agree on is what mechanism the
> kernel has to ensure a BAR resource is not changed by the
> PCI resource assignment mechanism.
> 
> (1) IORESOURCE_PCI_FIXED flag: I have *very* strong feelings that this
>     flag works because x86 kernel code claims all resources on probe
>     (which effectively makes them immutable). It does not work for
>     PCI bridges apertures and I am not sure it works for arches (eg
>     ARM64) that reassign everything, even for *devices*

I'm pretty sure this flag generally doesn't work indeed, see my other
messages on that subject. The only valid use I can see for this flag
honestly is in a context where we do an x86-style claim pass, to
prevent selected resources or bridge windows from being moved or
resized due to conflicts (ie: move the *other* one if there's a
conflict) but it's of very limited value.

> (2) Claiming resources: this basically means assigning a parent to
>     a resource.

Well yes, a claimed resource is immutable, that's a given.

That and of course it's implied that there are no conflicts between
claimed resources and/or the resources have valid existing assigned
values. From my quick look last week, it doesn't seem like
pci_bus_claim_resources() handle those edge cases terribly well.

This is why I'm keen on trying to standardize around the x86 (and
powerpc which is similar) method which deals with these partial
assignments and some broken cases much better.

> Kernel PCI resource assignment code is full of code I do not understand
> and that in my opinion works because of (1). I *tried*, while posting
> ACPI PCI code for ARM64, to do what x86 does:

:-)

> - Claim all resources
> - Reassign the resources that could not be claimed
> 
> This led to:
> 
> a) Spurious dmesg logs (Eg Resources that could not be claimed)
> b) If FW set-up bridge windows, it may lead to reassignment failures
>    if the bridge windows were sized *wrongly* but the kernel still
>    manage to claim them (because they fit in the parent window).
> 
> Hence, we reassign everything on ARM64 on ACPI (except for bus numbers
> and that was, indeed, a mistake).

Hrm... do you have example systems or the ability to give me access to
systems where these problems occur ? The x86 code has some smarts to
deal with BIOSes who didn't quite do the right thing that we might be
able to port over. What I called the "2 pass survey followed by
pci_assign_unassigned_resources". I've implemented something similar
back in the days for powerpc and it served us well there too. I'd like
to give a shot and making this generic and using it on arm64.

> In short, this is a mess and one that I do not have much leeway to fix
> because the PCI resource assignment code is as it is owing to legacy
> that we can't touch unless we want to fix regressions for the next
> year.

Yes and no... We can try to make the x86 method work on arm64 and see
what goes right or wrong. I wouldn't mind adding quirks to recognize
existing broken amd64 firmwares if those really still don't work
properly, if we can help getting future ones better as well.

> And then there is _DSM #5.
> 
> The problem we have is that there is *no* specific way to tell what's
> right or wrong and that's what _DSM #5 is supposed to fix. To implement
> it to the letter it will take lots of work.

Correct but I don't think we need fully "to the letter", it's almost
impossible to support all possible cases of _DSM #5 values at arbitrary
locations in the tree.

> First thing is to answer and agree (1) vs (2) above, aka define what
> an immutable resource is from a kernel point of view, one with a
> parent != NULL OR one with IORESOURCE_PCI_FIXED flag.

Well, I think (2) is a given. The issue however is that a resource
can't be "fixed" in isolation. It's parent need to be sufficiently
"fixed" to enclose that resource, and it's parent parent etc.... This
is why random _DSM #5 on leaf nodes isn't going to fly terribly well if
the FW hasn't already assigned sane windows to bridges.

I think the key here is to try to claim what FW setup iff it doesn't
conflict and looks like it's valid at all (assigned at all), using a
similar kind of 2 pass mechanism as x86 and powrpc that favors already
enabled devices (likely to be used by FW), and (re)assigns everything
else.

I can look at making x86 and powerpc common with reasonable ease as I
wrote a lot of the powerpc code for that. I don't have all the HW at
hand to test anymore but I still know who does :-)

For ARM64, the difficulty is all I have access to at the moment are
Amazon own systems which seem to have a reasonable FW assignment, and
qemu. So I'll need help with testing against the bad guys.

In the meantime, I'm still keen on getting in Ard patch that handles
_DSM #5 at the host bridge level. It will help existing valid use cases
and we are reasonably confident it won't break anything.

Cheers,
Ben.





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux