Sathyanarayanan, On Mon, Sep 28, 2020 at 10:44 AM Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 9/25/20 11:30 AM, Sinan Kaya wrote: > > On 9/25/2020 2:16 PM, Kuppuswamy, Sathyanarayanan wrote: > >>> > >>> If this is a too involved change, DPC driver should restore state > >>> when hotplug is not supported. > >> Yes. we can add a condition for hotplug capability check. > >>> > >>> DPC driver should be self-sufficient by itself. > >>> > > > > Sounds good. > > > >>>> Also for non-fatal errors, if reset is requested then we still need > >>>> some kind of bus reset call here > >>> > >>> DPC should handle both fatal and non-fatal cases > >> Currently DPC is only triggered for FATAL errors. > >> and cause a bus reset > > > > Thanks for the heads up. > > This seems to have changed since I looked at the DPC code. > > > >>> in hardware already before triggering an interrupt. > >> Error recovery is not triggered only DPC driver. AER also uses the > >> same error recovery code. If DPC is not supported, then we still need > >> reset logic. > > > > It sounds like we are cross-talking two issues. > > > > 1. no state restore on DPC after FATAL error. > > Let's fix this. > Agree. Few more detail about the above issue is, > > There are two cases under FATAL error. > > FATAL + hotplug - In this case, link will be reseted. And hotplug handler > will remove the driver state. This case works well with current code. > > FATAL + no-hotplug - In this case, link will still be reseted. But > currently driver state is not properly restored. So I attempted > to restore it using pci_reset_bus(). Seems you should fix something at device driver side, not do double-reset in DPC driver, one reset is done by hardware, and you want to do another by DPC driver ? Why hardware initiated reset is not enough for you ? Thanks, Ethan > status = reset_link(dev); > - if (status != PCI_ERS_RESULT_RECOVERED) { > + if (status == PCI_ERS_RESULT_RECOVERED) { > + status = PCI_ERS_RESULT_NEED_RESET; > > ... > > if (status == PCI_ERS_RESULT_NEED_RESET) { > /* > - * TODO: Should call platform-specific > - * functions to reset slot before calling > - * drivers' slot_reset callbacks? > + * TODO: Optimize the call to pci_reset_bus() > + * > + * There are two components to pci_reset_bus(). > + * > + * 1. Do platform specific slot/bus reset. > + * 2. Save/Restore all devices in the bus. > + * > + * For hotplug capable devices and fatal errors, > + * device is already in reset state due to link > + * reset. So repeating platform specific slot/bus > + * reset via pci_reset_bus() call is redundant. So > + * can optimize this logic and conditionally call > + * pci_reset_bus(). > */ > + pci_reset_bus(dev); > > > > > 2. no bus reset on NON_FATAL error through AER driver path. > > This already tells me that you need to split your change into > > multiple patches. > > > > Let's talk about this too. bus reset should be triggered via > > AER driver before informing the recovery. > But as per error recovery documentation, any call to > ->error_detected() or ->mmio_enabled() can request > PCI_ERS_RESULT_NEED_RESET. So we need to add code > to do the actual reset before calling ->slot_reset() > callback. So call to pci_reset_bus() fixes this > issue. > > if (status == PCI_ERS_RESULT_NEED_RESET) { > + pci_reset_bus(dev); > > > > > > if (status == PCI_ERS_RESULT_NEED_RESET) { > > /* > > * TODO: Should call platform-specific > > * functions to reset slot before calling > > * drivers' slot_reset callbacks? > > */ > > status = PCI_ERS_RESULT_RECOVERED; > > pci_dbg(dev, "broadcast slot_reset message\n"); > > pci_walk_bus(bus, report_slot_reset, &status); > > } > > > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer