Hi Bjorn, > On Nov 30, 2020, at 4:25 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Mon, Nov 30, 2020 at 07:54:37PM +0000, Kelley, Sean V wrote: >>> On Nov 24, 2020, at 9:17 AM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >>> On Mon, Nov 23, 2020 at 11:57:35PM +0000, Kelley, Sean V wrote: >>>>> On Nov 23, 2020, at 3:28 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >>>>> On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: >>>>>> In some cases a bridge may not exist as the hardware controlling may be >>>>>> handled only by firmware and so is not visible to the OS. This scenario is >>>>>> also possible in future use cases involving non-native use of RCECs by >>>>>> firmware. >>>>>> >>>>>> Explicitly apply conditional logic around these resets by limiting them to >>>>>> Root Ports and Downstream Ports. >>>>> >>>>> Can you help me understand this? The subject says "Limit AER resets" >>>>> and here you say "limit them to RPs and DPs", but it's not completely >>>>> obvious how the resets are being limited, i.e., the patch doesn't add >>>>> anything like: >>>>> >>>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>> + type == PCI_EXP_TYPE_DOWNSTREAM) >>>>> reset_subordinates(bridge); >>>>> >>>>> It *does* add checks around pcie_clear_device_status(), but that also >>>>> includes RC_EC. And that's not a reset, so I don't think that's >>>>> explicitly mentioned in the commit log. >>>> >>>> The subject should have referred to the clearing of the device status rather than resets. >>>> It originally came from this simpler patch in which I made use of reset instead of clear: >>>> >>>> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@xxxxxxxxxxxxxxxx/ >>>> >>>> So a rephrase of clearing in place of resets would be more appropriate. >>>> >>>> Then we added the notion of bridges…below >>>> >>>>> >>>>> Also see the question below. >>>>> >>>>>> Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@xxxxxxxxxxxxxxxx >>>>>> Signed-off-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx> >>>>>> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >>>>>> Acked-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> >>>>>> --- >>>>>> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ >>>>>> 1 file changed, 25 insertions(+), 6 deletions(-) >>>>>> >>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>>>>> index 8b53aecdb43d..7883c9791562 100644 >>>>>> --- a/drivers/pci/pcie/err.c >>>>>> +++ b/drivers/pci/pcie/err.c >>>>>> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) >>>>>> >>>>>> /** >>>>>> * pci_walk_bridge - walk bridges potentially AER affected >>>>>> - * @bridge: bridge which may be a Port >>>>>> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, >>>>>> + * or an RCiEP associated with an RCEC >>>>>> * @cb: callback to be called for each device found >>>>>> * @userdata: arbitrary pointer to be passed to callback >>>>>> * >>>>>> * If the device provided is a bridge, walk the subordinate bus, including >>>>>> * any bridged devices on buses under this bus. Call the provided callback >>>>>> * on each device found. >>>>>> + * >>>>>> + * If the device provided has no subordinate bus, call the callback on the >>>>>> + * device itself. >>>>>> */ >>>>>> static void pci_walk_bridge(struct pci_dev *bridge, >>>>>> int (*cb)(struct pci_dev *, void *), >>>>>> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, >>>>>> { >>>>>> if (bridge->subordinate) >>>>>> pci_walk_bus(bridge->subordinate, cb, userdata); >>>>>> + else >>>>>> + cb(bridge, userdata); >>>>>> } >>>>>> >>>>>> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> >>>>>> /* >>>>>> * Error recovery runs on all subordinates of the bridge. If the >>>>>> - * bridge detected the error, it is cleared at the end. >>>>>> + * bridge detected the error, it is cleared at the end. For RCiEPs >>>>>> + * we should reset just the RCiEP itself. >>>>>> */ >>>>>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>>> - type == PCI_EXP_TYPE_DOWNSTREAM) >>>>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>>>> + type == PCI_EXP_TYPE_RC_EC || >>>>>> + type == PCI_EXP_TYPE_RC_END) >>>>>> bridge = dev; >>>>>> else >>>>>> bridge = pci_upstream_bridge(dev); >>>>>> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> pci_dbg(bridge, "broadcast error_detected message\n"); >>>>>> if (state == pci_channel_io_frozen) { >>>>>> pci_walk_bridge(bridge, report_frozen_detected, &status); >>>>>> + if (type == PCI_EXP_TYPE_RC_END) { >>>>>> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); >>>>>> + status = PCI_ERS_RESULT_NONE; >>>>>> + goto failed; >>>>>> + } >>>>>> + >>>>>> status = reset_subordinates(bridge); >>>>>> if (status != PCI_ERS_RESULT_RECOVERED) { >>>>>> pci_warn(bridge, "subordinate device reset failed\n"); >>>>>> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> pci_dbg(bridge, "broadcast resume message\n"); >>>>>> pci_walk_bridge(bridge, report_resume, &status); >>>>>> >>>>>> - if (pcie_aer_is_native(bridge)) >>>>>> - pcie_clear_device_status(bridge); >>>>>> - pci_aer_clear_nonfatal_status(bridge); >>>>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>>>> + type == PCI_EXP_TYPE_RC_EC) { >>>>>> + if (pcie_aer_is_native(bridge)) >>>>>> + pcie_clear_device_status(bridge); >>>>>> + pci_aer_clear_nonfatal_status(bridge); >>>>> >>>>> This is hard to understand because "type" is from "dev", but "bridge" >>>>> is not necessarily the same device. Should it be this? >>>>> >>>>> type = pci_pcie_type(bridge); >>>>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>> ...) >>>> >>>> Correct, it would be better if the type was based on the ‘bridge’. >>> >>> OK. This is similar to >>> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@xxxxxxxxxxxxxxxx/, >>> which you cited above except for the bridge/dev question and the >>> addition here of RC_EC. >>> >>> I tried to split that back into its own patch and started with the >>> commit message from that patch. But I got stuck on the commit >>> message. I got as far as: >>> >>> In some cases an error may be reported by a device not visible to >>> the OS, e.g., if firmware manages the device and passes error >>> information to the OS via ACPI APEI. >>> >>> But I still can't quite connect that to the patch. "bridge" is >>> clearly a device visible to Linux. >>> >>> I guess we're trying to assert that if "bridge" is not a Root Port, >>> Downstream Port, or RCEC, we shouldn't clear the error status because >>> the error came from a device Linux doesn't know about. But I think >>> "bridge" is *always* either a Root Port or a Downstream Port: >> >> That’s ultimately what we are trying to do. >> >>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>> type == PCI_EXP_TYPE_DOWNSTREAM) >>> bridge = dev; >>> else >>> bridge = pci_upstream_bridge(dev); >>> >>> pci_upstream_bridge() returns either NULL (in which case previous uses >>> dereference a NULL pointer), or dev->bus->self, which is always a Root >>> Port, Switch Downstream Port, or Switch Upstream Port (or NULL for the >>> special case of VFs). >> >> In the past recall we were augmenting it with bridge = dev->rcec for >> RC_END. But we were able to relocate the handling in >> aer_root_reset(). >> >> So in this patch - we add the conditionals because RC_END is being >> passed in addition to RC_EC. >> >> if (type == PCI_EXP_TYPE_ROOT_PORT || >> >> - type == PCI_EXP_TYPE_DOWNSTREAM) >> >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC || >> + type == PCI_EXP_TYPE_RC_END) >> >> bridge = dev; >> else >> bridge = pci_upstream_bridge(dev); >> >> So we need to check for RP, DS, and RC_EC >> >> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> >> pci_dbg(bridge, "broadcast resume message\n"); >> pci_walk_bridge(bridge, report_resume, &status); >> >> >> - if (pcie_aer_is_native(bridge)) >> - pcie_clear_device_status(bridge); >> - pci_aer_clear_nonfatal_status(bridge); >> >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC) { >> + if (pcie_aer_is_native(bridge)) >> + pcie_clear_device_status(bridge); >> + pci_aer_clear_nonfatal_status(bridge); >> + } >> >> Breaking out a separate patch would be unnecessary as you correctly >> point out that it’s only going to be an RP or DS before this patch. > > Still trying to sort this out in my head, so half-baked questions > before I quit for the day: We call pcie_do_recovery() in four paths: > AER, APEI, DPC, and EDR, and I'm trying to understand what "dev" we > pass in all those cases. > > For DPC, I think "dev" must be a downstream port that triggered DPC, > and its secondary link is disabled. The device and any siblings have > already been reset by the link going down. We want to clear AER > status in downstream device(s) after they come back up (the AER status > bits are sticky, so they're not cleared by the reset), and the AER > status in the downstream port. > > I think EDR is the same as DPC? > > For AER, I think "dev" will typically (maybe always?) be the device > that detected the error and logged it in its AER Capability, not the > Root Port or RCEC that generated the interrupt. We want to reset > "dev" and any siblings, clear AER status in "dev", and clear AER > status in the Root Port or RCEC. > > For APEI, I assume "dev" is typically the device that detected the > error, and we want to reset it and any siblings. Firmware has already > read the AER status for "dev", and I assume firmware also clears it. > I assume firmware is also responsible for clearing AER status in the > Root Port, RCEC, or non-architected device that generated the > interrupt. > > It seems like there are basically two devices of interest in > pcie_do_recovery(): the endpoint where we have to call the driver > error recovery, and the port that generated the interrupt. I wonder > if this would make more sense if the caller passed both of them in > explicitly instead of having pcie_do_recovery() check the type of > "dev" and try to figure things out after the fact. On this last point I wanted to add that this is a possibility that could provide a clearer distinction, especially where actions need to be taken or not taken as a part of pcie_do_recovery(), i.e., bridge versus dev. In this patch series we have taken steps to minimize the need for the distinction by pushing the awareness into the driver’s error recovery routine, i.e., dev->rcec. A future evolution after this series could lead to both devices of interest being passed explicitly for the larger scope EDR/DPC/AER/etc. Thanks, Sean > > Bjorn