"Matthew R. Ochs" <mrochs@xxxxxxxxxxxxxxxxxx> writes: >>> The process_sense() routine can perform a read capacity which >>> can take some time to complete. If an EEH occurs while waiting >>> on the read capacity, the EEH handler is unable to obtain the >>> context's mutex in order to put the context in an error state. >>> The EEH handler will sit and wait until the context is free, >>> but this wait can last longer than the EEH handler tolerates, >>> leading to a failed recovery. >> >> I'm not quite clear on what you mean by the EEH handler timing >> out. AFAIK there's nothing in eehd and the EEH core that times out if a >> driver doesn't respond - indeed, it's pretty easy to hang eehd with a >> misbehaving driver. >> >> Are you referring to your own internal timeouts? >> cxlflash_wait_for_pci_err_recovery and anything else that uses >> CXLFLASH_PCI_ERROR_RECOVERY_TIMEOUT? > > Reading through this again I can see how this is misleading. This is > actually similar and related to the deadlock scenario described in > "Fix to avoid potential deadlock on EEH". Without this fix, you'd end > up in a similar situation but deadlocked on the context mutex instead > of the ioctl semaphore. That makes _much_ more sense. If you could please revise the commit message to explain that, you can include this in the next version: Reviewed-by: Daniel Axtens <dja@xxxxxxxxxx> Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html