On Fri, 14 Jun 2019 at 01:07, Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 2019-06-13 at 14:02 -0500, Bjorn Helgaas wrote: > > > > PCI FW r3.2 says 0 means "the OS must not ignore config done by > > firmware." That means we must keep the FW configuration intact. > > So I tried implementing what you seem to want and it doesn't make sense > imho. > > I think the wording of the spec is terrible and doesn't convey the > intent here. > > What is it that _DSM #5 is about ? Is it about telling us that the FW > config shall not be trusted ? That seem to be its original intent based > on the existing wording and the fact that "1" says "may ignore". > > Or was it always intended to be some kind of inverted logic with 0 > meaning that we *must* preserve what FW did ? > The original purpose was for firmware running on 64-bit hosts to not create a PCI resource assignment that was incompatible with the OS booting in 32-bit mode. So the expectation was that a 32-bit OS would reuse whatever config the firmware created, and the 64-bit version would be enlightened to understand that it could reassign resource assignments to make better use of the available resource windows, but this required a mechanism to describe which resources should be left alone by the OS. So I don't think anybody cares about that use case anymore, and I have no idea how widespread its use was when people did. > But preserving what FW did was always the default for x86 and > Windows... It's just that we happen to do something wrong today on > Linux/ARM64 which is to always reassign everything. > > The way Linux resource assignment works accross platforms has generally > been based on one of these 3 methods: > > - The standard x86 method, which is to claim what's there when it > doesn't look completely busted and has been assigned, then assign > what's left. This allows for FW doing partial assignment, and allows to > work around a number of BIOS bugs. > > - The "probe only" method. This was created independently on powerpc > and some other archs afaik. At least for powerpc, the reason for that > is some interesting virtualization cases where we just cannot touch or > change or move anything. The effect is to not reassign even what we > dont like, and not call pci_assign_unassign_resources(). > > - The "reassign everything" method. This is used by almost all > embedded patforms accross archs. All arm32, all arm64 today (but we > agree that's wrong), all embedded powerpc etc... This is basically > meant for us not trusting whatever random uboot or other embedded FW, > if any, and do a full from-scratch assignment. There are issues in how > that is implemented accross the various platforms/archs, some for > example still honor existing bus numbers and some don't, but I doubt > it's intentional etc... but that method is there to stay. > > Now, the questions are two fold > > - How do we map _DSM #5 to these, at least on arm64, maybe in the > long run we can also look at affecting x86, but that's less urgent. > > - How do I ensure the above fixes my Amazon platform ? :-) > It would help if you could explain what exactly is wrong with your Amazon platform :-) > There's one obvious thing here. If we don't want to break existing > things, then the absence of _DSM #5 must not change our existing > behaviour. I think we can all agree on this. > > We might explore changing arm64 default behaviour as a second step > since we all agree it's not doing what it should, but we also know that > it will probably break some things so we need to be careful, understand > the issues and work around them. This isn't the scope of the initial > _DSM #5 patch. > > That leaves us with the _DSM #5 present cases. > > Now we have two values. What do they mean ? As I already said before, > the wording with "must not ignore" and "may ignore" is completely > useless and doesn't tell us a thing about the intention here. We don't > know why the FW folks may have put a given value here, and what they > expect us to do about it. > > What we do know is that Seattle returns 1 and needs reassignment, and > we do know that the Amazon platforms return 0 and will want us to not > touch the existing setup. > > However, does 1 means "business as usual" or does it mean "reassign > everything" ? > > Does 0 means "probe only" ? Or do we still do an assignment pass for > things that the FW may have left unassigned ? > > Today in Linux, the "probe only" logic tends to not call > pci_assign_unassigned_resources for example. > > From a pure reading of the spec, there's an argument to be made that > both 0 and 1 values can lead to the same code that reads what's there > and reassign what's missing. > > So this is a mess, a usual when it comes to specs written by > committees, but at this stage I'm at a loss as to what you want me to > do. >