Hi, I've been looking for ways to minimize the impact of faulty drive in a multipath and raid array environment. Our major problem is that it takes long time before upper layer (dm-mpath, dm-raid or maybe other middleware kernel module by users) can handle timed out i/o because of huge recovery operation by scsi driver's error handler. Let me explain some details. The scsi error recovery can be regarded as something that essentialy conflicts with multipath or raid array environment, because in such an environment where the system itself provides redundant path or disks, it is usually designed to use alternative path or disks right after detecting faulty drive. As far as I know there has been couples of approaches done in scsi or md layer (eg. http://www.mail-archive.com/linux-raid@xxxxxxxxxxxxxxx/msg09024.html) to minimize recovery time although neither of them have been mereged for some reason. This patch is a simple aproach to omit error recovery phase. Each scsi device has no_recovery sysfs entry to select whether the device needs recovery (off course no_recovery is disabled on default). If it is enabled, then the scsi device that corresponds to the timed out command is rapidly offlined with DRIVER_TIMEOUT status. This enables upper layers have chance to take care of timed out commands without waiting for recovery to finish. This is how it shows on /var/log/messages when no_recovery option is enabled with all the verbose logs for scsi mldd and lldd set. If no_recovery is disabled, it takes as much as 30 minutes or so depending on implementation of lldd and number of timed out commands. ----- Apr 21 19:50:47 localhost kernel: lpfc 0000:03:04.0: 0:(0):0309 Mailbox cmd x31 issue Data: x20 x700 x2 Apr 21 19:50:47 localhost kernel: lpfc 0000:03:04.0: 0:(0):0307 Mailbox cmd x31 (x0) Cmpl xf85fa862 Data: x123100 x0 x0 x0 x0 x0 x0 x0 x0 Apr 21 19:50:57 localhost kernel: lpfc 0000:03:04.0: 0:(0):0309 Mailbox cmd x31 issue Data: x20 x700 x2 Apr 21 19:50:57 localhost kernel: lpfc 0000:03:04.0: 0:(0):0307 Mailbox cmd x31 (x0) Cmpl xf85fa862 Data: x123100 x0 x0 x0 x0 x0 x0 x0 x0 Apr 21 19:51:02 localhost kernel: lpfc 0000:03:04.0: 0:(0):0309 Mailbox cmd x31 issue Data: x20 x700 x2 Apr 21 19:51:02 localhost kernel: lpfc 0000:03:04.0: 0:(0):0307 Mailbox cmd x31 (x0) Cmpl xf85fa862 Data: x123100 x0 x0 x0 x0 x0 x0 x0 x0 Apr 21 19:51:07 localhost kernel: lpfc 0000:03:04.0: 0:(0):0309 Mailbox cmd x31 issue Data: x20 x700 x2 Apr 21 19:51:07 localhost kernel: lpfc 0000:03:04.0: 0:(0):0307 Mailbox cmd x31 (x0) Cmpl xf85fa862 Data: x123100 x0 x0 x0 x0 x0 x0 x0 x0 Apr 21 19:51:10 localhost kernel: sd 3:0:0:0: Device offlined - no recovery Apr 21 19:51:10 localhost kernel: scsi_work_offline_sdev scmd: c5bb7e00 result: 6000000 Apr 21 19:51:10 localhost kernel: sd 3:0:0:0: [sdc] Unhandled error code Apr 21 19:51:10 localhost kernel: sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT Apr 21 19:51:10 localhost kernel: sd 3:0:0:0: [sdc] CDB: Read(10): 28 00 00 02 16 e8 00 00 08 00 Apr 21 19:51:10 localhost kernel: end_request: I/O error, dev sdc, sector 136936 Apr 21 19:51:10 localhost kernel: device-mapper: multipath: Failing path 8:32. Apr 21 19:51:10 localhost multipathd: 8:32: mark as failed Apr 21 19:51:10 localhost multipathd: mpatha: remaining active paths: 1 Apr 21 19:51:11 localhost multipathd: dm-4: add map (uevent) Apr 21 19:51:11 localhost multipathd: dm-4: devmap already registered ----- If Linux's scsi driver provides lldd with eh_strategy_handler() interface to implement its own recovery code, then I think it is also natural to provide an interface to give up any recovery options for those who do not want it. Any comments would be helpful, thanks. By the way, this patch is based on the bug fix (http://lkml.org/lkml/2010/4/14/29) that I have submitted recently. This bug fix has been merged to -mm tree. Thanks, Tomohiro Kusumi Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@xxxxxxxxxxxxxx> --- diff -aNur linux-2.6.34-rc5.org/drivers/scsi/scsi_error.c linux-2.6.34-rc5/drivers/scsi/scsi_error.c --- linux-2.6.34-rc5.org/drivers/scsi/scsi_error.c 2010-04-20 08:29:56.000000000 +0900 +++ linux-2.6.34-rc5/drivers/scsi/scsi_error.c 2010-04-23 20:34:08.492478179 +0900 @@ -129,6 +129,11 @@ scsi_log_completion(scmd, TIMEOUT_ERROR); + if (scmd->device->no_recovery) { + schedule_work(&scmd->tmo_work); + return rtn; + } + if (scmd->device->host->transportt->eh_timed_out) rtn = scmd->device->host->transportt->eh_timed_out(scmd); else if (scmd->device->host->hostt->eh_timed_out) @@ -2104,3 +2109,25 @@ } } EXPORT_SYMBOL(scsi_build_sense_buffer); + +/** + * scsi_work_offline_sdev - offline device and finish timed out command + * @work: work_struct for scsi command to get rid of + **/ +void scsi_work_offline_sdev(struct work_struct *work) +{ + struct scsi_cmnd *scmd = container_of(work, struct scsi_cmnd, tmo_work); + + if (scmd->device->sdev_state == SDEV_RUNNING) { + scsi_device_set_state(scmd->device, SDEV_OFFLINE); + sdev_printk(KERN_INFO, scmd->device, + "Device offlined - no recovery\n"); + } + + if (!scmd->result) + scmd->result |= (DRIVER_TIMEOUT << 24); + + SCSI_LOG_ERROR_RECOVERY(1, printk("%s scmd: %p result: %x\n", + __func__, scmd, scmd->result)); + scsi_finish_command(scmd); +} diff -aNur linux-2.6.34-rc5.org/drivers/scsi/scsi_lib.c linux-2.6.34-rc5/drivers/scsi/scsi_lib.c --- linux-2.6.34-rc5.org/drivers/scsi/scsi_lib.c 2010-04-20 08:29:56.000000000 +0900 +++ linux-2.6.34-rc5/drivers/scsi/scsi_lib.c 2010-04-23 20:34:08.493478584 +0900 @@ -1038,6 +1038,7 @@ cmd->request = req; cmd->cmnd = req->cmd; + INIT_WORK(&cmd->tmo_work, scsi_work_offline_sdev); return cmd; } diff -aNur linux-2.6.34-rc5.org/drivers/scsi/scsi_priv.h linux-2.6.34-rc5/drivers/scsi/scsi_priv.h --- linux-2.6.34-rc5.org/drivers/scsi/scsi_priv.h 2010-04-20 08:29:56.000000000 +0900 +++ linux-2.6.34-rc5/drivers/scsi/scsi_priv.h 2010-04-23 20:34:08.493478584 +0900 @@ -73,6 +73,7 @@ int scsi_eh_get_sense(struct list_head *work_q, struct list_head *done_q); int scsi_noretry_cmd(struct scsi_cmnd *scmd); +void scsi_work_offline_sdev(struct work_struct*); /* scsi_lib.c */ extern int scsi_maybe_unblock_host(struct scsi_device *sdev); diff -aNur linux-2.6.34-rc5.org/drivers/scsi/scsi_scan.c linux-2.6.34-rc5/drivers/scsi/scsi_scan.c --- linux-2.6.34-rc5.org/drivers/scsi/scsi_scan.c 2010-04-20 08:29:56.000000000 +0900 +++ linux-2.6.34-rc5/drivers/scsi/scsi_scan.c 2010-04-23 20:34:08.494477948 +0900 @@ -257,6 +257,7 @@ sdev->lun = lun; sdev->channel = starget->channel; sdev->sdev_state = SDEV_CREATED; + sdev->no_recovery = 0; INIT_LIST_HEAD(&sdev->siblings); INIT_LIST_HEAD(&sdev->same_target_siblings); INIT_LIST_HEAD(&sdev->cmd_list); diff -aNur linux-2.6.34-rc5.org/drivers/scsi/scsi_sysfs.c linux-2.6.34-rc5/drivers/scsi/scsi_sysfs.c --- linux-2.6.34-rc5.org/drivers/scsi/scsi_sysfs.c 2010-04-23 19:59:21.896228214 +0900 +++ linux-2.6.34-rc5/drivers/scsi/scsi_sysfs.c 2010-04-23 20:36:27.120227650 +0900 @@ -544,6 +544,7 @@ sdev_rd_attr (vendor, "%.8s\n"); sdev_rd_attr (model, "%.16s\n"); sdev_rd_attr (rev, "%.4s\n"); +sdev_rw_attr (no_recovery, "%d\n"); /* * TODO: can we make these symlinks to the block layer ones? @@ -738,6 +739,7 @@ &dev_attr_iodone_cnt.attr, &dev_attr_ioerr_cnt.attr, &dev_attr_modalias.attr, + &dev_attr_no_recovery.attr, REF_EVT(media_change), NULL }; diff -aNur linux-2.6.34-rc5.org/include/scsi/scsi_cmnd.h linux-2.6.34-rc5/include/scsi/scsi_cmnd.h --- linux-2.6.34-rc5.org/include/scsi/scsi_cmnd.h 2010-04-20 08:29:56.000000000 +0900 +++ linux-2.6.34-rc5/include/scsi/scsi_cmnd.h 2010-04-23 20:34:08.495478045 +0900 @@ -129,6 +129,8 @@ int result; /* Status code from lower level driver */ unsigned char tag; /* SCSI-II queued command tag */ + + struct work_struct tmo_work; /* work for no recovery on timeout */ }; extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t); diff -aNur linux-2.6.34-rc5.org/include/scsi/scsi_device.h linux-2.6.34-rc5/include/scsi/scsi_device.h --- linux-2.6.34-rc5.org/include/scsi/scsi_device.h 2010-04-20 08:29:56.000000000 +0900 +++ linux-2.6.34-rc5/include/scsi/scsi_device.h 2010-04-23 20:35:29.255569542 +0900 @@ -163,6 +163,8 @@ atomic_t iodone_cnt; atomic_t ioerr_cnt; + int no_recovery; /* no error recovery on timeout if set */ + struct device sdev_gendev, sdev_dev; -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html