Re: [PATCH pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled"

Vladimir Oltean <vladimir.oltean@xxxxxxx> · Thu, 1 Jun 2023 19:33:35 +0300

On Thu, Jun 01, 2023 at 10:44:45AM -0500, Bjorn Helgaas wrote:
> To make sure I understand you, I think you're saying that if Function
> 0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's
> device disabled status") breaks things because we don't enumerate
> Function 0 and the driver can't temporarily claim it to zero out its
> piece of the shared memory.
> 
> With just 6fffbc7ae137, we don't enumerate Function 0, which means we
> don't see that it's a multi-function device, so we don't enumerate
> Functions 1, 2, etc, either.
> 
> With both 6fffbc7ae137 and your current patch, we would enumerate
> Functions 1, 2, etc, but we still skip Function 0, so its piece of the
> shared memory still doesn't get zeroed.

I'm saying that as long as commit 6fffbc7ae137 ("PCI: Honor firmware's
device disabled status") exists in the form where the pci_driver :: probe()
is completely skipped for disabled functions, the NXP ENETC PCIe device
has a problem no matter what the function number is. That problem is:
the device drivers of all PCIe functions need to clear some memory
before they ultimately fail to probe (as they should), because of some
hardware design oversight. That is no longer possible if the driver has
no hook to execute code for those devices that are disabled.

On top of that, function 0 having status = "disabled" is extra
problematic, because the PCI core will now just assume that functions 1 .. N
don't exist at all, which is simply false, because the usefulness of
ENETC port 0 (PCIe function 0) from a networking perspective is
independent from the usefulness of ENETC port 1 (PCIe function 1), ENETC
port 2 etc.

> 
> > The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation
> > to essentially describe on-chip memory spaces, which are always present.
> > So presumably, a different system-level solution to initialize those
> > shared memories (U-Boot?) may be chosen, if implementing this workaround
> > in Linux puts too much pressure on the PCIe core and the way in which it
> > does things. Initially I didn't want to do this in prior boot stages
> > because we only enable the RCEC in Linux, nothing is broken other than
> > the spurious AER messages, and, you know.. the kernel may still run
> > indefinitely on top of bootloaders which don't have the workaround applied.
> > So working around it in Linux avoids one dependency.
> 
> If I understand correctly, something (bootloader or Linux) needs to do
> something to Function 0 (e.g., clear memory).

To more than just function 0 (also 1, 2 and 6). There are 2 confounding
problems, the latter being something that was exposed by your question:
what will happen that's bad with the current mainline code structure,
*notwithstanding* the fact that function 0 may have status = "disabled"
(which currently will skip enumeration for the rest of the functions
which don't have status = "disabled").

> Doing it in Linux would minimize dependences on the bootloader, so
> that seems desirable to me. That means Linux needs to enumerate
> Function 0 so it is visible to a driver or possibly a quirk.

Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
the workaround that we currently have for NXP ENETC in Linux.

Also, I'm not sure if it was completely reasonable of me in the first
place to exploit this quirk of the Linux PCI bus - that the probe
function is called even if a device is disabled in the device tree.
I would understand if I was forced to rethink that.

> I think we could contemplate implementing 6fffbc7ae137 in a different
> way.  Checking DT status at driver probe-time would probably work for
> Loongson, but wouldn't quite solve the NXP problem because the driver
> wouldn't be able to claim Function 0 even temporarily.

Not sure what you mean by "checking DT status at driver probe-time".
Does enetc_pf_probe() -> of_device_is_available() qualify? You probably
mean earlier than that.

My problem is that I don't really understand what was the functional
need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
status") in the first place, considering that any device driver can
already fail to probe based on the same condition at its own will.

> Is DT the only way to learn the NXP SERDES configuration?  I think it
> would be much better if there were a way to programmatically learn it,
> because then you wouldn't have to worry about syncing the DT with the
> platform configuration, and it would decouple this from the Loongson
> situation.

Syncing the DT with the platform configuration will always be necessary,
because for networking we will also need extra information which is
completely non-discoverable, like a phy-handle or such, and that depends
on the wiring and static pinmuxing of the SoC. So it is practically
reasonable to expect that what is usable has status = "okay", and what
isn't has status = "disabled". Not to mention, there are already device
trees in circulation which are written that way, and those need to
continue to work.

> (If there were a way to actually discover the Loongson situation
> instead of relying on DT, e.g., by keying off a Device ID or
> something, that would be much better, too.  I assume we explored that,
> but I don't remember the details.)
> 
> Bjorn

What is it that's special about the Loongson situation?