Hi, Hannes: On 07/01/2013 02:50 PM, Hannes Reinecke wrote:
This patchs adds an 'eh_deadline' sysfs attribute to the scsi host which limits the overall runtime of the SCSI EH. The 'eh_deadline' value is stored in the now obsolete field 'resetting'. When a command is failed the start time of the EH is stored in 'last_reset'. If the overall runtime of the SCSI EH is longer than last_reset + eh_deadline, the EH is short-circuited and falls through to issue a host reset only.
There is one thing during my test: if I want to stop EH ASAP, I can only set the 'eh_deadline' as the minimum value: 1 second. But on my box, since scsi command times out, it takes less than 1 second before the first check point - comparingthe overall runtime of the SCSI EH with last_reset + eh_deadline as you said. So, the EH could only be stopped once it spends more than 1 second before the check point rather than stopping at the first time. This problem is also existed in your second patchset "New EH command timeout handler" - it spends less than 1 second before the check point in scsi_abort_command(). So, should a special handling be considered for 1 second? E.g., we just past eh deadline when 1 second is set even if 1 second hasn't been reached. Or, should 0 second mean stopping EH ASAP rather than disabling eh_deadline?
Signed-off-by: Hannes Reinecke<hare@xxxxxxx>
<snip>
@@ -1059,14 +1107,28 @@ static int scsi_eh_abort_cmds(struct list_head *work_q, struct scsi_cmnd *scmd, *next; LIST_HEAD(check_list); int rtn; + struct Scsi_Host *shost; + unsigned long flags; list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!(scmd->eh_eflags& SCSI_EH_CANCEL_CMD)) continue; + shost = scmd->device->host; + spin_lock_irqsave(shost->host_lock, flags); + if (scsi_host_eh_past_deadline(shost)) {
Especially speaking: could we remove this check point? In other words, could we keep aborting? According to my test, scsi_try_to_abort_cmd() takes so little time that we can ignore it. So, keeping aborting won't reduce the performance of stopping EH, and it is worth trying. Also, I'd like removing the check point in your new added scmd_eh_abort_handler() in your second patchset. Thanks, Ren
+ spin_unlock_irqrestore(shost->host_lock, flags); + list_splice_init(&check_list, work_q); + SCSI_LOG_ERROR_RECOVERY(3, + shost_printk(KERN_INFO, shost, + "skip %s, past eh deadline\n", + __func__)); + return list_empty(work_q); + } + spin_unlock_irqrestore(shost->host_lock, flags); SCSI_LOG_ERROR_RECOVERY(3, printk("%s: aborting cmd:" "0x%p\n", current->comm, scmd)); - rtn = scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd); + rtn = scsi_try_to_abort_cmd(shost->hostt, scmd); if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { scmd->eh_eflags&= ~SCSI_EH_CANCEL_CMD; if (rtn == FAST_IO_FAIL)
-- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html