Fix handling of the dev_loss and nodev timeouts. Symptoms: when remote port disappears for a period of time longer then either nodev_tmo or dev_loss_tmo, the lpfc driver worker thread will stall removing that remote port. Cause: removing remote port involves un-blocking and sync-ing corresponding block device queue. But corresponding node in the lpfc driver is still in the NPR(?node port recovery?) state and mid-layer gets SCSI_MLQUEUE_HOST_BUSY as a return value when it is trying to call queuecommand() with command for that node (AKA remote port) Fix: Instead of returning SCSI_MLQUEUE_HOST_BUS from queuecommand() for nodes in NPR states complete it with retry-able error code DID_BUS_BUSY Signed-off-by: James Smart <James.Smart@xxxxxxxxxx> --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -746,6 +746,10 @@ lpfc_queuecommand(struct scsi_cmnd *cmnd cmnd->result = ScsiResult(DID_NO_CONNECT, 0); goto out_fail_command; } + else if (ndlp->nlp_state == NLP_STE_NPR_NODE) { + cmnd->result = ScsiResult(DID_BUS_BUSY, 0); + goto out_fail_command; + } /* * The device is most likely recovered and the driver * needs a bit more time to finish. Ask the midlayer - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html