scsi_error: improve the recovery latency for timeouted scsi cmds

Ren Mingxin <renmx@xxxxxxxxxxxxxx> · Wed, 20 Mar 2013 11:24:49 +0800

Hi,

Please let me ask one question about improving the recovery latency
for timeouted scmds:

In the functions 'scsi_eh_wakeup()' & 'scsi_error_handler()', there
are two same condition judgements which ensure the number of active
scmds equals to the number of failed scmds:

  void scsi_eh_wakeup(struct Scsi_Host *shost)
  {
      if (shost->host_busy == shost->host_failed)
          wake_up_process(shost->ehandler);
  }

  int scsi_error_handler(void *data)
  {
      while (!kthread_should_stop()) {
          if ((shost->host_failed == 0 &&
               shost->host_eh_scheduled == 0) ||
               shost->host_failed != shost->host_busy) {
              schedule();
              continue;
          }
          ....
      }
      ....
  }

I think the original reason for waking up eh thread until all scmds
complete/fail may be in case of more overhead produced by threads
waking up time after time, right?

But in the below condition, the strategy above seems not appropriate:

  If a scmd is issued and stuck and another scmd is issued, scsi eh
  detects a timeout of the first scmd, but has to wait for the second
  one to be timedout/completed. Which means the first timeouted scmds
  couldn't be handled in time.

This may be fatal to a certain extent(the critical system especially).
So, please let me know the starting point for the wakeup strategy in
eh. We'd investigate further based on your comments. Any suggestions
will be appreciated.

Thanks,
Ren
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html