On Fri, Jun 14, 2024 at 01:41:20PM -0500, Bjorn Helgaas wrote: > On Tue, May 28, 2024 at 02:42:00PM +0800, Bitao Hu wrote: > > "present" and "link_active" can be 1 if the status is ready, and 0 if > > it is not. Both of them can be -ENODEV if reading the config space > > of the hotplug port failed. That's typically the case if the hotplug > > port itself was hot-removed. Therefore, this situation can occur: > > pciehp_card_present() may return 1 and pciehp_check_link_active() > > may return -ENODEV because the hotplug port was hot-removed in-between > > the two function calls. In that case we'll emit both "Card present" > > *and* "Link Up" since both 1 and -ENODEV are considered "true". This > > is not the expected behavior. Those messages should be emited when > > "present" and "link_active" are positive. [...] > > --- a/drivers/pci/hotplug/pciehp_ctrl.c > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > > @@ -276,10 +276,10 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > > case OFF_STATE: > > ctrl->state = POWERON_STATE; > > mutex_unlock(&ctrl->state_lock); > > - if (present) > > + if (present > 0) > > I completely agree that this is a problem and this patch addresses it. > But ... > > It seems a little bit weird to me that we even get to this switch > statement if we got -ENODEV from either pciehp_card_present() or > pciehp_check_link_active(). If that happens, a config read failed, > but we're going to go ahead and call pciehp_enable_slot(), which is > going to do a bunch more config accesses, potentially try to power up > the slot, etc. > > If a config read failed, it seems like we might want to avoid doing > some of this stuff. Hm, good point. I guess we should change the logical expression instead: - if (present <= 0 && link_active <= 0) { + if (present < 0 || link_active < 0 || (!present && !link_active)) { > > - if (link_active) > > + if (link_active > 0) > > ctrl_info(ctrl, "Slot(%s): Link Up\n", > > slot_name(ctrl)); > > These are cases where we misinterpreted -ENODEV as "device is present" > or "link is active". > > pciehp_ignore_dpc_link_change() and pciehp_slot_reset() also call > pciehp_check_link_active(), and I think they also interpret -ENODEV as > "link is active". > > Do we need similar changes there? Another good observation, both need to check for <= 0 instead of == 0. Do you want to fix that yourself or would you prefer me (or someone else) to submit a patch? Thanks, Lukas