On Fri, Mar 13, 2020 at 01:26:28PM -0700, Kuppuswamy Sathyanarayanan wrote: > On 3/13/20 12:28 PM, Bjorn Helgaas wrote: > > [+cc Russell, Sam, Oliver since we're talking about the error recovery > > flow. The code we're talking about is at [1]] > > > > On Thu, Mar 12, 2020 at 11:22:13PM -0700, Kuppuswamy, Sathyanarayanan wrote: > > > On 3/12/2020 3:32 PM, Bjorn Helgaas wrote: > > > > On Thu, Mar 12, 2020 at 02:59:15PM -0700, Kuppuswamy Sathyanarayanan wrote: > > > > > On 3/12/20 12:53 PM, Bjorn Helgaas wrote: > > > > > > On Wed, Mar 11, 2020 at 04:07:59PM -0700, Kuppuswamy Sathyanarayanan wrote: > > > > > > > On 3/11/20 3:23 PM, Bjorn Helgaas wrote: > > > > > > > > Is any synchronization needed here between the EDR path and the > > > > > > > > hotplug/enumeration path? > > > > > > > If we want to follow the implementation note step by step (in > > > > > > > sequence) then we need some synchronization between EDR path and > > > > > > > enumeration path. But if it's OK to achieve the same end result by > > > > > > > following steps out of sequence then we don't need to create any > > > > > > > dependency between EDR and enumeration paths. Currently we follow > > > > > > > the latter approach. > > > > > > What would the synchronization look like? > > > > > we might need some way to disable the enumeration path till > > > > > we get response from firmware. > > > > > > > > > > In native hot plug case, I think we can do it in two ways. > > > > > > > > > > 1. Disable hotplug notification in slot ctl registers. > > > > > (pcie_disable_notification()) > > > > > 2. Some how block hotplug driver from processing the new > > > > > events (not sure how feasible its). > > > > > > > > > > Following method 1 would be easy, But I am not sure whether > > > > > its alright to disable them randomly. I think, unless we > > > > > clear the status as well, we might get some issues due to stale > > > > > notification history. > > > > > > > > > > For ACPI event case, I am not sure whether we have some > > > > > communication protocol in place to disable receiving ACPI > > > > > events temporarily. > > > > > > > > > > For polling model, we need to disable to the polling > > > > > timer thread till we receive _OST response from firmware. > > > > > > Ideally I think it would be better to follow the order in the > > > > > > flowchart if it's not too onerous. > > > > > None of the above changes will be pretty and I think it will > > > > > not be simple as well. > > > > > > That will make the code easier to > > > > > > understand. The current situation with this dependency on pciehp and > > > > > > what it will do leaves a lot of things implicit. > > > > > > > > > > > > What happens if CONFIG_PCIE_EDR=y but CONFIG_HOTPLUG_PCI_PCIE=n? > > > > > > > > > > > > IIUC, when DPC triggers, pciehp is what fields the DLLSC interrupt and > > > > > > unbinds the drivers and removes the devices. > > > > > > If that doesn't happen, and Linux clears the DPC trigger to bring > > > > > > the link back up, will those drivers try to operate uninitialized > > > > > > devices? > > > > > I don't think this will happen. In DPC reset_link before we bring up > > > > > the device we wait for link to go down first using > > > > > pcie_wait_for_link(pdev, false) function. > > > > I understand that, but these child devices were reset when DPC > > > > disabled the link. When the link comes back up, their BARs > > > > contain zeros. > > > > > > > > If CONFIG_HOTPLUG_PCI_PCIE=y, the DLLSC interrupt will cause > > > > pciehp to unbind the driver. It seems like the unbind races with > > > > the EDR notify handler. > > > Agree. But even if there is a race condition, after clearing DPC > > > trigger status, if hotplug driver properly removes/re-enumerates the > > > driver then the end result will still be same. There should be no > > > functional impact. > > > > > > > If pciehp unbinds the driver before edr_handle_event() calls > > > > pcie_do_recovery(), this seems fine -- we'll call > > > > dpc_reset_link(), which brings up the link, we won't call any > > > > driver callbacks because there's no driver, and another DLLSC > > > > interrupt will cause pciehp to re-enumerate, which will > > > > re-initialize the device, then rebind the driver. > > > > > > > > If the EDR notify handler runs before pciehp unbinds the driver, > > > In the above case, from the kernel perspective device is still > > > accessible and IIUC, it will try to recover it in pcie_do_recovery() > > > using one of the callbacks. > > > > > > int (*mmio_enabled)(struct pci_dev *dev); > > > int (*slot_reset)(struct pci_dev *dev); > > > void (*resume)(struct pci_dev *dev); > > > > > > One of these callbacks will do pci_restore_state() to restore the > > > device, and IO will not attempted in these callbacks until the device > > > is successfully recovered. > > That might be what *should* happen, but I don't think it's what > > *does* happen. > > > > I don't think we use .mmio_enabled() and .slot_reset() for EDR > > because Linux EDR currently depends on DPC, so we'll be using > > dpc_reset_link(), which normally returns PCI_ERS_RESULT_RECOVERED, > > so pcie_do_recovery() skips .mmio_enabled() and .slot_reset(). > After our discussion about non-hotplug cases, I am thinking > that reset_link() callback should not return > PCI_ERS_RESULT_RECOVERED in non hotplug cases. If > successfully reset-ed, it should return PCI_ERS_RESULT_NEED_RESET. > This will enable pcie_do_recovery() to proceed to .slot_reset() to > successfully recover the device. > > Any comments ? > > > > I looked at the first few .resume() implementations (FWIW, I used [2] > > to find them), and none of them calls pci_restore_state() before doing > > I/O to the device: > > > > ioat_pcie_error_resume() > > pci_resume() (hfi1) > > qib_pci_resume() > > cxl_pci_resume() > > genwqe_err_resume() > > ... > > > > But I assume you've tested EDR with some driver that *does* call > > pci_restore_state()? Or maybe you have pciehp enabled, > Yes. I have tested it only with hotplug enabled. Let me try to disable > hotplug and verify the cases. > > and it always > > wins the race and unbinds the driver before the EDR notification? It > > would be interesting to make pciehp *lose* the race and see if > > anything breaks. > > > > pci-error-recovery.rst does not mention any requirement for the driver > > to call pci_restore_state(), and I think any state restoration like > > that should be the responsibility of the PCI core, not the driver. > > > > > > couldn't EDR bring up the link and call driver .mmio_enabled() before > > > > the device has been initialized? > > > Calling mmio_enabled in this case should not be a problem right? > > > > > > Please check the following content from > > > Documentation/PCI/pci-error-recovery.rst. IIUC (following content), > > > IO will not be attempted until the device is successfully > > > re-configured. > > > > > > STEP 2: MMIO Enabled > > > -------------------- > > > This callback is made if all drivers on a segment agree that they can > > > try to recover and if no automatic link reset was performed by the HW. > > > If the platform can't just re-enable IOs without a slot reset or a link > > > reset, it will not call this callback, and instead will have gone > > > directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) > > > > > > > If CONFIG_HOTPLUG_PCI_PCIE=n and CONFIG_HOTPLUG_PCI_ACPI=y, I could > > > > believe that the situations are similar to the above. > > > > > > > > What if CONFIG_HOTPLUG_PCI_PCIE=n and CONFIG_HOTPLUG_PCI_ACPI=n? Then > > > > I assume there's nothing to unbind the driver, so pcie_do_recovery() > > > > will call the driver .mmio_enabled() and other recovery callbacks on a > > > > device that hasn't been initialized? > > > probably in .slot_reset() callback device config will be restored and it > > > will make the device functional again. > > I don't think .mmio_enabled() is a problem because IIUC, the device > > should not have been reset before calling .mmio_enabled(). > In hotplug case, it is possible. since reset_link() is called before > .mmio_enabled, the device might be in reset state by the time > .mmio_enabled is called. > > > > But I think .slot_reset() *is* a problem. I looked at several > > .slot_reset() implementations ([3]); some called pci_restore_state(), > > but many did not. > > > > If no hotplug driver is enabled, I think the .slot_reset() callbacks > > that do not call pci_restore_state() are broken. > Yes. Agree. May be the documentation needs to be explicit about it ? Sorry, I got distracted here and lost the flow of the conversation. I haven't been able to think about the synchronization question and your comments above yet. I do think that when pci_restore_state() is required, it should be done by the PCI core, not by the drivers. But I think that's out of scope for this series, so probably a project for later. I made a few of the updates Christoph suggested and updated the review/edr branch. Do you want to start with that as the basis for a v18 posting?