Re: [PATCH RESEND] nvme-pci: Fix EEH failure on ppc after subsystem reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 10, 2024 at 12:35:06AM +0530, Nilay Shroff wrote:
> On 3/9/24 21:14, Keith Busch wrote:
> > Your patch may observe a ctrl in "RESETTING" state from
> > error_detected(), then disable the controller, which quiesces the admin
> > queue. Meanwhile, reset_work may proceed to CONNECTING state and try
> > nvme_submit_sync_cmd(), which blocks forever because no one is going to
> > unquiesce that admin queue.
> > 
> OK I think I got your point. However, it seems that even without my patch
> the above mentioned deadlock could still be possible. 

I sure hope not. The current design should guarnatee forward progress on
initialization failed devices.

> Without my patch, if error_detcted() observe a ctrl in "RESETTING" state then 
> it still invokes nvme_dev_disable(). The only difference with my patch is that 
> error_detected() returns the PCI_ERS_RESULT_NEED_RESET instead of PCI_ERS_RESULT_DISCONNECT.

There's one more subtle difference: that condition disables with the
'shutdown' parameter set to 'true' which accomplishes a couple things:
all entered requests are flushed to their demise via the final
unquiesce, and all request_queue's are killed which forces error returns
for all new request allocations. No thread will be left waiting for
something that won't happen.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux