On Thu, Mar 12, 2020 at 02:59:15PM -0700, Kuppuswamy Sathyanarayanan wrote: > Hi Bjorn, > > On 3/12/20 12:53 PM, Bjorn Helgaas wrote: > > On Wed, Mar 11, 2020 at 04:07:59PM -0700, Kuppuswamy Sathyanarayanan wrote: > > > On 3/11/20 3:23 PM, Bjorn Helgaas wrote: > > > > Is any synchronization needed here between the EDR path and the > > > > hotplug/enumeration path? > > > If we want to follow the implementation note step by step (in > > > sequence) then we need some synchronization between EDR path and > > > enumeration path. But if it's OK to achieve the same end result by > > > following steps out of sequence then we don't need to create any > > > dependency between EDR and enumeration paths. Currently we follow > > > the latter approach. > > What would the synchronization look like? > we might need some way to disable the enumeration path till > we get response from firmware. > > In native hot plug case, I think we can do it in two ways. > > 1. Disable hotplug notification in slot ctl registers. > (pcie_disable_notification()) > 2. Some how block hotplug driver from processing the new > events (not sure how feasible its). > > Following method 1 would be easy, But I am not sure whether > its alright to disable them randomly. I think, unless we > clear the status as well, we might get some issues due to stale > notification history. > > For ACPI event case, I am not sure whether we have some > communication protocol in place to disable receiving ACPI > events temporarily. > > For polling model, we need to disable to the polling > timer thread till we receive _OST response from firmware. > > > > Ideally I think it would be better to follow the order in the > > flowchart if it's not too onerous. > None of the above changes will be pretty and I think it will > not be simple as well. > > That will make the code easier to > > understand. The current situation with this dependency on pciehp and > > what it will do leaves a lot of things implicit. > > > > What happens if CONFIG_PCIE_EDR=y but CONFIG_HOTPLUG_PCI_PCIE=n? > > > > IIUC, when DPC triggers, pciehp is what fields the DLLSC interrupt and > > unbinds the drivers and removes the devices. > > > If that doesn't happen, and Linux clears the DPC trigger to bring > > the link back up, will those drivers try to operate uninitialized > > devices? > > I don't think this will happen. In DPC reset_link before we bring up > the device we wait for link to go down first using > pcie_wait_for_link(pdev, false) function. I understand that, but these child devices were reset when DPC disabled the link. When the link comes back up, their BARs contain zeros. If CONFIG_HOTPLUG_PCI_PCIE=y, the DLLSC interrupt will cause pciehp to unbind the driver. It seems like the unbind races with the EDR notify handler. If pciehp unbinds the driver before edr_handle_event() calls pcie_do_recovery(), this seems fine -- we'll call dpc_reset_link(), which brings up the link, we won't call any driver callbacks because there's no driver, and another DLLSC interrupt will cause pciehp to re-enumerate, which will re-initialize the device, then rebind the driver. If the EDR notify handler runs before pciehp unbinds the driver, couldn't EDR bring up the link and call driver .mmio_enabled() before the device has been initialized? If CONFIG_HOTPLUG_PCI_PCIE=n and CONFIG_HOTPLUG_PCI_ACPI=y, I could believe that the situations are similar to the above. What if CONFIG_HOTPLUG_PCI_PCIE=n and CONFIG_HOTPLUG_PCI_ACPI=n? Then I assume there's nothing to unbind the driver, so pcie_do_recovery() will call the driver .mmio_enabled() and other recovery callbacks on a device that hasn't been initialized?