Re: [PATCH 06/12] scsi_transport_srp: Reduce failover time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/30/2015 2:02 PM, Bart Van Assche wrote:
On 04/30/15 12:13, Sagi Grimberg wrote:
On 4/30/2015 11:59 AM, Bart Van Assche wrote:
Unlike FC, there is no risk that SCSI devices will be offlined
by the error handler if it is started while a reconnect attempt
is ongoing. Hence report timeout errors as soon as possible
if a higher software layer (e.g. multipathd) needs to be informed
about a timeout. It is assumed that if both fast_io_fail_tmo < 0
and rport->dev_loss_tmo < 0 that this means that there is no
need to report timeouts quickly.

Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
Cc: James Bottomley <JBottomley@xxxxxxxx>
Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx>
Cc: Sebastian Parschauer <sebastian.riemer@xxxxxxxxxxxxxxxx>
---
  drivers/scsi/scsi_transport_srp.c | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_srp.c
b/drivers/scsi/scsi_transport_srp.c
index 4a44337..6667c2b 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -61,6 +61,11 @@ static inline struct Scsi_Host
*rport_to_shost(struct srp_rport *r)
      return dev_to_shost(r->dev.parent);
  }

+static inline struct srp_rport *shost_to_rport(struct Scsi_Host *shost)
+{
+    return transport_class_to_srp_rport(&shost->shost_gendev);
+}
+
  /**
   * srp_tmo_valid() - check timeout combination validity
   * @reconnect_delay: Reconnect delay in seconds.
@@ -626,9 +631,11 @@ static enum blk_eh_timer_return
srp_timed_out(struct scsi_cmnd *scmd)
      struct scsi_device *sdev = scmd->device;
      struct Scsi_Host *shost = sdev->host;
      struct srp_internal *i = to_srp_internal(shost->transportt);
+    struct srp_rport *rport = shost_to_rport(shost);

      pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
-    return i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
+    return rport->fast_io_fail_tmo < 0 && rport->dev_loss_tmo < 0 &&
+        i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
          BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;
  }



I'm not sure I understand the purpose of this patch? If the user
requested fast_io_fail_tmo of X, I would assume it wants srp to retry
for the command for X secs. Why should we fail it any earlier than that?

There is a situation where I've seen srp starting the fast_io_fail_tmo
later than expected because the QP retry exceeded error completion
arrives late for certain workloads. Was this the motivation for this
patch?

The purpose of the srp_timed_out() function is not about failing
commands early but about postponing timeout errors. The above patch
causes a SCSI timeout to be handled as soon as the command timer expires
if rport->fast_io_fail_tmo >= 0 or rport->dev_loss_tmo >= 0.

This change is effective only when
time_to_wc_err + fast_io_fail_tmo < cmd_timeout.

If the user chose this configuration, I imagine the wanted behavior is
that the timeout errors won't be propagated as soon as the cmd timeout
expires no?

Today, srp return BLK_EH_RESET_TIMER until fast_io_fail_tmo kicks in,
and at that point, it unblocks the scsi target terminates all the
commands with TRANSPORT_FAIL_FAST. This will change the behavior to trigger scsi-eh (abort, reset_device) and reset_host will trigger
another reconnect which is redundant (we have reconnect work inflight).

Can you explain the motivation?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux