Hi Keith, On 2/28/24 16:49, Nilay Shroff wrote: > > > On 2/27/24 23:59, Keith Busch wrote: >> On Fri, Feb 09, 2024 at 10:32:16AM +0530, Nilay Shroff wrote: >>> If the nvme subsyetm reset causes the loss of communication to the nvme >>> adapter then EEH could potnetially recover the adapter. The detection of >>> comminication loss to the adapter only happens when the nvme driver >>> attempts to read an MMIO register. >>> >>> The nvme subsystem reset command writes 0x4E564D65 to NSSR register and >>> schedule adapter reset.In the case nvme subsystem reset caused the loss >>> of communication to the nvme adapter then either IO timeout event or >>> adapter reset handler could detect it. If IO timeout even could detect >>> loss of communication then EEH handler is able to recover the >>> communication to the adapter. This change was implemented in 651438bb0af5 >>> (nvme-pci: Fix EEH failure on ppc). However if the adapter communication >>> loss is detected in nvme reset work handler then EEH is unable to >>> successfully finish the adapter recovery. >>> >>> This patch ensures that, >>> - nvme driver reset handler would observer pci channel was offline after >>> a failed MMIO read and avoids marking the controller state to DEAD and >>> thus gives a fair chance to EEH handler to recover the nvme adapter. >>> >>> - if nvme controller is already in RESETTNG state and pci channel frozen >>> error is detected then nvme driver pci-error-handler code sends the >>> correct error code (PCI_ERS_RESULT_NEED_RESET) back to the EEH handler >>> so that EEH handler could proceed with the pci slot reset. >> >> A subsystem reset takes the link down. I'm pretty sure the proper way to >> recover from it requires pcie hotplug support. >> > Yes you're correct. We require pcie hotplugging to recover. However powerpc EEH > handler could able to recover the pcie adapter without physically removing and > re-inserting the adapter or in another words, it could reset adapter without > hotplug activity. In fact, powerpc EEH could isolate pcie slot and resets it > (i.e. resetting the PCI device holding the PCI #RST line high for two seconds), > followed by setting up the device config space (the base address registers > (BAR's), latency timer, cache line size, interrupt line, and so on). > > You may find more information about EEH recovery here: > https://www.kernel.org/doc/Documentation/powerpc/eeh-pci-error-recovery.txt > > Typically when pcie error is detected and the EEH is able to recover the device, > the EEH handler code goes through below sequence (assuming driver is EEH aware): > > eeh_handle_normal_event() > eeh_set_channel_state()-> set state to pci_channel_io_frozen > eeh_report_error() > nvme_error_detected() -> channel state "pci_channel_io_frozen"; returns PCI_ERS_RESULT_NEED_RESET > eeh_slot_reset() -> recovery successful > nvme_slot_reset() -> returns PCI_ERS_RESULT_RECOVERED > eeh_set_channel_state()-> set state to pci_channel_io_normal > nvme_error_resume() > > In case pcie erorr is detected and the EEH is unable to recover the device, > the EEH handler code goes through the below sequence: > > eeh_handle_normal_event() > eeh_set_channel_state()-> set state to pci_channel_io_frozen > eeh_report_error() > nvme_error_detected() -> channel state pci_channel_io_frozen; returns PCI_ERS_RESULT_NEED_RESET > eeh_slot_reset() -> recovery failed > eeh_report_failure() > nvme_error_detected()-> channel state pci_channel_io_perm_failure; returns PCI_ERS_RESULT_DISCONNECT > eeh_set_channel_state()-> set state to pci_channel_io_perm_failure > nvme_remove() > > > If we execute the command "nvme subsystem-reset ..." and adapter communication is > lost then in the current code (under nvme_reset_work()) we simply disable the device > and mark the controller DEAD. However we may have a chance to recover the controller > if driver is EEH aware and EEH recovery is underway. We already handle one such case > in nvme_timeout(). So this patch ensures that if we fall through nvme_reset_work() > post subsystem-reset and the EEH recovery is in progress then we give a chance to the > EEH mechanism to recover the adapter. If in case the EEH recovery is unsuccessful then > we'd anyway fall through code path I mentioned above where we invoke nvme_remove() at > the end and delete the erring controller. > BTW, the similar issue was earlier fixed in 651438bb0af5(nvme-pci: Fix EEH failure on ppc). And that fix was needed due to the controller health check polling was removed in b2a0eb1a0ac72869(nvme-pci: Remove watchdog timer). In fact, today we may be able to recover the NVMe adapter if subsystem-reset or any other PCI error occurs and at the same time some I/O request is in flight. The recovery is possible due to the in flight I/O request would eventually timeout and the nvme_timeout() has special code (added in 651438bb0af5) that gives EEH a chance to recover the adapter. However later, in 1e866afd4bcdd(nvme: ensure subsystem reset is single threaded), the nvme subsystem reset code was reworked so now when user executes command subsystem-reset, kernel first writes 0x4E564D65 to nvme NSSR register and then schedules the adapter reset. It's quite possible that when subsytem-reset is executed there were no I/O in flight and hence we may never hit the nvme_timeout(). Later when the adapter reset code (under nvme_reset_work()) start execution, it accesses MMIO registers. Hence, IMO, potentially nvme_reset_work() would also need similar changes as implemented under nvme_timeout() so that EEH recovery could be possible. > With the proposed patch, we find that EEH recovery is successful post subsystem-reset. > Please find below the relevant output: > # lspci > 0524:28:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller CM7 2.5" (rev 01) > > # nvme list-subsys > nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:7DQ0A01206N3 > hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988 > iopolicy=numa > \ > +- nvme0 pcie 0524:28:00.0 live > > # nvme subsystem-reset /dev/nvme0 > > # nvme list-subsys > nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:7DQ0A01206N3 > hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988 > iopolicy=numa > \ > +- nvme0 pcie 0524:28:00.0 resetting > > [10556.034082] EEH: Recovering PHB#524-PE#280000 > [10556.034108] EEH: PE location: N/A, PHB location: N/A > [10556.034112] EEH: Frozen PHB#524-PE#280000 detected > [10556.034115] EEH: Call Trace: > [10556.034117] EEH: [c000000000051068] __eeh_send_failure_event+0x7c/0x15c > [10556.034304] EEH: [c000000000049bcc] eeh_dev_check_failure.part.0+0x27c/0x6b0 > [10556.034310] EEH: [c008000004753d3c] nvme_pci_reg_read32+0x80/0xac [nvme] > [10556.034319] EEH: [c0080000045f365c] nvme_wait_ready+0xa4/0x18c [nvme_core] > [10556.034333] EEH: [c008000004754750] nvme_dev_disable+0x370/0x41c [nvme] > [10556.034338] EEH: [c008000004757184] nvme_reset_work+0x1f4/0x3cc [nvme] > [10556.034344] EEH: [c00000000017bb8c] process_one_work+0x1f0/0x4f4 > [10556.034350] EEH: [c00000000017c24c] worker_thread+0x3bc/0x590 > [10556.034355] EEH: [c00000000018a87c] kthread+0x138/0x140 > [10556.034358] EEH: [c00000000000dd58] start_kernel_thread+0x14/0x18 > [10556.034363] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures. > [10556.034368] EEH: Notify device drivers to shutdown > [10556.034371] EEH: Beginning: 'error_detected(IO frozen)' > [10556.034376] PCI 0524:28:00.0#280000: EEH: Invoking nvme->error_detected(IO frozen) > [10556.034379] nvme nvme0: frozen state error detected, reset controller > [10556.102654] nvme 0524:28:00.0: enabling device (0000 -> 0002) > [10556.103171] nvme nvme0: PCI recovery is ongoing so let it finish > [10556.142532] PCI 0524:28:00.0#280000: EEH: nvme driver reports: 'need reset' > [10556.142535] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset' > [...] > [...] > [10556.148172] EEH: Reset without hotplug activity > [10558.298672] EEH: Beginning: 'slot_reset' > [10558.298692] PCI 0524:28:00.0#280000: EEH: Invoking nvme->slot_reset() > [10558.298696] nvme nvme0: restart after slot reset > [10558.301925] PCI 0524:28:00.0#280000: EEH: nvme driver reports: 'recovered' > [10558.301928] EEH: Finished:'slot_reset' with aggregate recovery state:'recovered' > [10558.301939] EEH: Notify device driver to resume > [10558.301944] EEH: Beginning: 'resume' > [10558.301947] PCI 0524:28:00.0#280000: EEH: Invoking nvme->resume() > [10558.331051] nvme nvme0: Shutdown timeout set to 10 seconds > [10558.356679] nvme nvme0: 16/0/0 default/read/poll queues > [10558.357026] PCI 0524:28:00.0#280000: EEH: nvme driver reports: 'none' > [10558.357028] EEH: Finished:'resume' > [10558.357035] EEH: Recovery successful. > > # nvme list-subsys > nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:7DQ0A01206N3 > hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988 > iopolicy=numa > \ > +- nvme0 pcie 0524:28:00.0 live > > Thanks, --Nilay