Re: [PATCHv2 15/20] PCI/pciehp: Fix powerfault detection order

Lukas Wunner <lukas@xxxxxxxxx> · Fri, 7 Sep 2018 22:26:31 +0200

On Fri, Sep 07, 2018 at 03:03:32PM -0500, Bjorn Helgaas wrote:
> I applied this to for-linus with the following changelog.  Let me know
> if I didn't understand this correctly.  I changed the comment in
> pciehp_power_on_slot() so it doesn't say "sticky" to avoid confusion
> with the PCI spec concept of sticky register bits (ROS, RWS, RW1CS).

The edited changelog and patch look perfectly fine to me, thanks a lot
and sorry for missing this when doing the rework.  (I missed it because
Thunderbolt doesn't have a power controller, hence can't signal a power
fault.)

The "sticky" refers to the property that if a power fault occurs and
the Power Fault Detected bit is cleared to acknowledge receipt of the
event, and if the power fault persists, the bit is immediately set
again and another interrupt is signaled.  In that sense, the bit is
"sticky" and that's what the code comment was referring to.  It's
basically level-triggered as long as the power fault persists.

pciehp does not clear the bit on receipt of a PFD event, but only sets
a flag in its internal struct.  This avoids an interrupt storm.
Both the bit and the internal flag are cleared when attempting to bring
the slot up again, either through an unplug-replug operation by the user
or an enable request via sysfs or an Attention Button press.  In either
case user intervention is required.  If the power fault is still not gone,
bringup of the slot is aborted.

The problem here was not only that the LED is turned off despite the slot
being brought up, but that the internal flag ctrl->power_fault_detected
was incorrectly set to 1 even though it had just been set to 0 when
successfully bringing up the slot.

There are some oddities with the power fault handling code, such as a
"TBD" code comment in pcie_enable_notification() where it's unclear if
there's really anything left "to be done".  I collected this and other
oddities in this e-mail:
https://www.spinics.net/lists/linux-pci/msg75743.html

Thanks,

Lukas