On 07/12/2013 12:00 PM, Ren Mingxin wrote: > Hi, Hannes: > > On 07/12/2013 02:09 PM, Hannes Reinecke wrote: >> On 07/12/2013 06:14 AM, Ren Mingxin wrote: >>> On 07/01/2013 10:24 PM, Hannes Reinecke wrote: >>>> With the original SCSI EH I got: >>>> # time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct >>>> 4096+0 records in >>>> 4096+0 records out >>>> 16777216 bytes (17 MB) copied, 142.652 s, 118 kB/s >>>> >>>> real 2m22.657s >>>> user 0m0.013s >>>> sys 0m0.145s >>>> >>>> With this patchset I got: >>>> # time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct >>>> 4096+0 records in >>>> 4096+0 records out >>>> 16777216 bytes (17 MB) copied, 52.1579 s, 322 kB/s >>>> >>>> real 0m52.163s >>>> user 0m0.012s >>>> sys 0m0.145s >>>> >>>> Test was to disable RSCN on the target port, disable the >>>> target port, and then start the 'dd' command as indicated. >>> >>> Do you mean disabling RSCN/port is enough? I'm afraid I couldn't >>> reproduce the problem by your steps. Both with and without your >>> patchset are the same 'dd' result: 27s. Please let me know where I >>> neglected or mistook: >>> >>> 1) I made a dm-multipath target 'dm-0' whose grouping policy was >>> failover; >>> 2) Disable RSCN/port via brocade fc switch: >>> SW300:root> portcfg rscnsupr 15 --enable; portDisable 15 >>> 3) Start the 'dd' command: >>> # time dd if=/dev/zero of=/dev/dm-0 bs=4k count=4k oflag=direct >>> dd: writing `/dev/sde': Input/output error >>> 1+0 records in >>> 0+0 records out >>> 0 bytes (0 B) copied, 27.8588 s, 0.0 kB/s >>> >>> real 0m27.860s >>> user 0m0.001s >>> sys 0m0.000s >> >> You are aware that you have to disable RSCNs on the _target_ port, >> right? >> Disabling RSCNs on the _initiator_ ports is a well-tested case, and >> the one which actually makes sense (and is even implemented in >> QLogic switches). >> Disabling RSCNs for the _target_ port, OTOH, has a very questionable >> nature (hence QLogic switches don't even allow you to do this). > > You're right. By disabling RSCNs on target port, I've reproduced this > problem. Thank you so much. But I've encountered the bug I said > before. I'll test again with your new patchset once you send. > Could you check with the attached patch? That should convert it to delayed_work and avoid this issue. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
>From 2d0851bea02e3d15bf403e6800cc7164f2e211f2 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke <hare@xxxxxxx> Date: Fri, 12 Jul 2013 12:06:19 +0200 Subject: [PATCH] scsi: use delayed_work to trigger aborts Command aborts are invoked from an interrupt, so we cannot use cancel_work_sync() when an outstanding abort needs to be cancelled. So convert the mechanism to delayed_work, which can be cancelled even from an interrupt context. Signed-off-by: Hannes Reinecke <hare@xxxxxxx> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index a62ae84..13a3418 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -297,7 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask) cmd->device = dev; INIT_LIST_HEAD(&cmd->list); - INIT_WORK(&cmd->abort_work, scmd_eh_abort_handler); + INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler); spin_lock_irqsave(&dev->list_lock, flags); list_add_tail(&cmd->list, &dev->cmd_list); spin_unlock_irqrestore(&dev->list_lock, flags); @@ -354,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd) list_del_init(&cmd->list); spin_unlock_irqrestore(&cmd->device->list_lock, flags); + cancel_delayed_work_sync(&cmd->abort_work); + __scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev); } EXPORT_SYMBOL(scsi_put_command); diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 28c272f..cd443b2 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -187,7 +187,7 @@ void scmd_eh_abort_handler(struct work_struct *work) { struct scsi_cmnd *scmd = - container_of(work, struct scsi_cmnd, abort_work); + container_of(work, struct scsi_cmnd, abort_work.work); struct scsi_device *sdev = scmd->device; unsigned long flags; int rtn; @@ -254,7 +254,7 @@ scsi_abort_command(struct scsi_cmnd *scmd) SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, "scmd %p abort timeout\n", scmd)); - cancel_work_sync(&scmd->abort_work); + cancel_delayed_work(&scmd->abort_work); return BLK_EH_NOT_HANDLED; } @@ -279,7 +279,7 @@ scsi_abort_command(struct scsi_cmnd *scmd) SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, "scmd %p abort scheduled\n", scmd)); - schedule_work(&scmd->abort_work); + schedule_delayed_work(&scmd->abort_work, HZ / 100); return BLK_EH_SCHEDULED; } EXPORT_SYMBOL_GPL(scsi_abort_command); diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index a2f062e..e3137e3 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -55,7 +55,7 @@ struct scsi_cmnd { struct scsi_device *device; struct list_head list; /* scsi_cmnd participates in queue lists */ struct list_head eh_entry; /* entry for the host eh_cmd_q */ - struct work_struct abort_work; + struct delayed_work abort_work; int eh_eflags; /* Used by error handlr */ /*