> On Sep 28, 2015, at 8:25 PM, Daniel Axtens <dja@xxxxxxxxxx> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > "Matthew R. Ochs" <mrochs@xxxxxxxxxxxxxxxxxx> writes: > > >> The process_sense() routine can perform a read capacity which >> can take some time to complete. If an EEH occurs while waiting >> on the read capacity, the EEH handler is unable to obtain the >> context's mutex in order to put the context in an error state. >> The EEH handler will sit and wait until the context is free, >> but this wait can last longer than the EEH handler tolerates, >> leading to a failed recovery. > > I'm not quite clear on what you mean by the EEH handler timing > out. AFAIK there's nothing in eehd and the EEH core that times out if a > driver doesn't respond - indeed, it's pretty easy to hang eehd with a > misbehaving driver. > > Are you referring to your own internal timeouts? > cxlflash_wait_for_pci_err_recovery and anything else that uses > CXLFLASH_PCI_ERROR_RECOVERY_TIMEOUT? Reading through this again I can see how this is misleading. This is actually similar and related to the deadlock scenario described in "Fix to avoid potential deadlock on EEH". Without this fix, you'd end up in a similar situation but deadlocked on the context mutex instead of the ioctl semaphore. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html