Re: [PATCH pci] PCI: don't skip probing entire device if first fn OF node has status = "disabled"

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Thu, 1 Jun 2023 12:51:39 -0500

On Thu, Jun 01, 2023 at 07:33:35PM +0300, Vladimir Oltean wrote:
> On Thu, Jun 01, 2023 at 10:44:45AM -0500, Bjorn Helgaas wrote:
> > To make sure I understand you, I think you're saying that if Function
> > 0 has DT status "disabled", 6fffbc7ae137 ("PCI: Honor firmware's
> > device disabled status") breaks things because we don't enumerate
> > Function 0 and the driver can't temporarily claim it to zero out its
> > piece of the shared memory.
> > 
> > With just 6fffbc7ae137, we don't enumerate Function 0, which means we
> > don't see that it's a multi-function device, so we don't enumerate
> > Functions 1, 2, etc, either.
> > 
> > With both 6fffbc7ae137 and your current patch, we would enumerate
> > Functions 1, 2, etc, but we still skip Function 0, so its piece of the
> > shared memory still doesn't get zeroed.
> 
> I'm saying that as long as commit 6fffbc7ae137 ("PCI: Honor firmware's
> device disabled status") exists in the form where the pci_driver :: probe()
> is completely skipped for disabled functions, the NXP ENETC PCIe device
> has a problem no matter what the function number is.

Yep.

> That problem is:
> the device drivers of all PCIe functions need to clear some memory
> before they ultimately fail to probe (as they should), because of some
> hardware design oversight. That is no longer possible if the driver has
> no hook to execute code for those devices that are disabled.

Yep.  If there's no pci_dev, there's no nice way to do anything to the
device.

> On top of that, function 0 having status = "disabled" is extra
> problematic, because the PCI core will now just assume that functions 1 .. N
> don't exist at all, which is simply false, because the usefulness of
> ENETC port 0 (PCIe function 0) from a networking perspective is
> independent from the usefulness of ENETC port 1 (PCIe function 1), ENETC
> port 2 etc.

Yes.

> > > The ENETC is not a hot-pluggable PCIe device. It uses Enhanced Allocation
> > > to essentially describe on-chip memory spaces, which are always present.
> > > So presumably, a different system-level solution to initialize those
> > > shared memories (U-Boot?) may be chosen, if implementing this workaround
> > > in Linux puts too much pressure on the PCIe core and the way in which it
> > > does things. Initially I didn't want to do this in prior boot stages
> > > because we only enable the RCEC in Linux, nothing is broken other than
> > > the spurious AER messages, and, you know.. the kernel may still run
> > > indefinitely on top of bootloaders which don't have the workaround applied.
> > > So working around it in Linux avoids one dependency.
> > 
> > If I understand correctly, something (bootloader or Linux) needs to do
> > something to Function 0 (e.g., clear memory).
> 
> To more than just function 0 (also 1, 2 and 6).

Yes.

> There are 2 confounding
> problems, the latter being something that was exposed by your question:
> what will happen that's bad with the current mainline code structure,
> *notwithstanding* the fact that function 0 may have status = "disabled"
> (which currently will skip enumeration for the rest of the functions
> which don't have status = "disabled").
> 
> > Doing it in Linux would minimize dependences on the bootloader, so
> > that seems desirable to me. That means Linux needs to enumerate
> > Function 0 so it is visible to a driver or possibly a quirk.
> 
> Uhm... no, that wouldn't be enough. Only a straight revert would satisfy
> the workaround that we currently have for NXP ENETC in Linux.

I guess you mean a revert of 6fffbc7ae137?  This whole conversation is
about whether we can rework 6fffbc7ae137 to work both for Loongson and
for you, so nothing is decided yet.

The point is, I assume you agree that it's preferable if we don't have
to depend on a bootloader to clear the memory.

> Also, I'm not sure if it was completely reasonable of me in the first
> place to exploit this quirk of the Linux PCI bus - that the probe
> function is called even if a device is disabled in the device tree.
> I would understand if I was forced to rethink that.

After 6fffbc7ae137, the probe function is not called if the device is
disabled in DT because there's no pci_dev for it at all.

> > I think we could contemplate implementing 6fffbc7ae137 in a different
> > way.  Checking DT status at driver probe-time would probably work for
> > Loongson, but wouldn't quite solve the NXP problem because the driver
> > wouldn't be able to claim Function 0 even temporarily.
> 
> Not sure what you mean by "checking DT status at driver probe-time".
> Does enetc_pf_probe() -> of_device_is_available() qualify? You probably
> mean earlier than that.

I was thinking about something in pci_device_probe(), e.g., by
extending pci_device_can_probe().  But again, we're just exploring the
solution space; I'm not saying this is the best or only path.

> My problem is that I don't really understand what was the functional
> need for commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
> status") in the first place, considering that any device driver can
> already fail to probe based on the same condition at its own will.

In general, PCI drivers shouldn't rely on DT.  If the bus driver (PCI
in this case) calls a driver's probe function, the driver can assume
the device exists.  But enetc is not a general-purpose driver, and if
DT is the only way to discover this property, I guess you're stuck
doing that.

> > Is DT the only way to learn the NXP SERDES configuration?  I think it
> > would be much better if there were a way to programmatically learn it,
> > because then you wouldn't have to worry about syncing the DT with the
> > platform configuration, and it would decouple this from the Loongson
> > situation.
> 
> Syncing the DT with the platform configuration will always be necessary,
> because for networking we will also need extra information which is
> completely non-discoverable, like a phy-handle or such, and that depends
> on the wiring and static pinmuxing of the SoC. So it is practically
> reasonable to expect that what is usable has status = "okay", and what
> isn't has status = "disabled". Not to mention, there are already device
> trees in circulation which are written that way, and those need to
> continue to work.

Just because we need DT for non-discoverable info A doesn't mean we
should depend on it for B if B *is* discoverable.

This question of disabling a device via DT but still needing to do
things to the device is ... kind of a sticky wicket.

Maybe this should be a different DT property (not "status").  Then PCI
enumeration could work normally and 6fffbc7ae137 wouldn't be in the
way.

> > (If there were a way to actually discover the Loongson situation
> > instead of relying on DT, e.g., by keying off a Device ID or
> > something, that would be much better, too.  I assume we explored that,
> > but I don't remember the details.)
> 
> What is it that's special about the Loongson situation?