Hi Keith, On Tue, Oct 11, 2016 at 03:39:07PM -0400, Keith Busch wrote: > The pciehp control only clears the saved power fault detected if power > controller control is present, but there is no requirement that the > capability exist in order to observe power faults. This means a hot-added > device in a slot that had experienced a power fault at some point would > never be able to add a new device since the power fault detected would > be on permanently. > > This patch sets the sticky field only if there is a chance it can > be cleared later. I agree we should handle power_fault_detected consistently with respect to POWER_CTRL(). We currently clear power_fault_detected in pciehp_power_on_slot(), but we only call that if we have a power controller. But we set power_fault_detected in pciehp_isr() always, resulting in this "sticky" situation. I don't think it's 100% clear from the spec whether power faults can be detected without a power controller. All the "power fault" references are in the context of a power controller, e.g., r3.0 sec 6.7.1.8, 7.8.10, 7.8.11. But regardless, we're certainly doing the wrong thing right now. If we do have a power controller, it's fairly clear what we should do: - Power off the slot (even though the hardware is supposed to remove power automatically, sec 6.7.1.8 suggests that software should turn off the power), - Wait at least one second, per 6.7.1.8 (I think we do the power off and wait in the board_added() path, but not in the INT_POWER_FAULT path where we handle asynchronous events), - It looks like we currently turn on the attention indicator and turn off the power indicator (the INT_POWER_FAULT case in interrupt_event_handler()), - Wait for another slot event. If we *don't* have a power controller, we currently set power_fault_detected and log a message, but nothing else. > Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> > --- > drivers/pci/hotplug/pciehp_hpc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c > index b57fc6d..b083e1c 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -631,7 +631,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > > /* Check Power Fault Detected */ > if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { > - ctrl->power_fault_detected = 1; > + if (POWER_CTRL(ctrl)) > + ctrl->power_fault_detected = 1; > ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(slot)); > pciehp_queue_interrupt_event(slot, INT_POWER_FAULT); I think this is correct as far as it goes, but I think we should do a little more: - When there's no power controller, we can't do anything to resolve a power fault, so I think the ctrl_err() should be rate-limited, - We can't turn off the power, but we could turn on the attention indicator, e.g., by making the INT_POWER_FAULT case look like this: case INT_POWER_FAULT: set_slot_off(ctrl, p_slot); break; -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html