On 4/30/2015 11:59 AM, Bart Van Assche wrote:
Unlike FC, there is no risk that SCSI devices will be offlined by the error handler if it is started while a reconnect attempt is ongoing. Hence report timeout errors as soon as possible if a higher software layer (e.g. multipathd) needs to be informed about a timeout. It is assumed that if both fast_io_fail_tmo < 0 and rport->dev_loss_tmo < 0 that this means that there is no need to report timeouts quickly. Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> Cc: James Bottomley <JBottomley@xxxxxxxx> Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx> Cc: Sebastian Parschauer <sebastian.riemer@xxxxxxxxxxxxxxxx> --- drivers/scsi/scsi_transport_srp.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c index 4a44337..6667c2b 100644 --- a/drivers/scsi/scsi_transport_srp.c +++ b/drivers/scsi/scsi_transport_srp.c @@ -61,6 +61,11 @@ static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r) return dev_to_shost(r->dev.parent); } +static inline struct srp_rport *shost_to_rport(struct Scsi_Host *shost) +{ + return transport_class_to_srp_rport(&shost->shost_gendev); +} + /** * srp_tmo_valid() - check timeout combination validity * @reconnect_delay: Reconnect delay in seconds. @@ -626,9 +631,11 @@ static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd) struct scsi_device *sdev = scmd->device; struct Scsi_Host *shost = sdev->host; struct srp_internal *i = to_srp_internal(shost->transportt); + struct srp_rport *rport = shost_to_rport(shost); pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev)); - return i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ? + return rport->fast_io_fail_tmo < 0 && rport->dev_loss_tmo < 0 && + i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ? BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED; }
I'm not sure I understand the purpose of this patch? If the user requested fast_io_fail_tmo of X, I would assume it wants srp to retry for the command for X secs. Why should we fail it any earlier than that? There is a situation where I've seen srp starting the fast_io_fail_tmo later than expected because the QP retry exceeded error completion arrives late for certain workloads. Was this the motivation for this patch? Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html