On 05/30/2012 01:56 AM, Bart Van Assche wrote: > On 05/29/12 17:35, Mike Christie wrote: > >> On 05/29/2012 10:00 AM, Bart Van Assche wrote: >>> The patch below makes sure that blk_drain_queue() and blk_cleanup_queue() >>> wait until all queuecommand invocations have finished and hence fixes a >>> race between the SCSI error handler and __scsi_remove_device(). Any feedback >>> is welcome. >>> >>> --- >>> drivers/scsi/scsi_error.c | 14 +++++++++++++- >>> 1 files changed, 13 insertions(+), 1 deletions(-) >>> >>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >>> index 386f0c5..947f627 100644 >>> --- a/drivers/scsi/scsi_error.c >>> +++ b/drivers/scsi/scsi_error.c >>> @@ -781,10 +781,17 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd, >>> struct scsi_device *sdev = scmd->device; >>> struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd); >>> struct Scsi_Host *shost = sdev->host; >>> + struct request_queue *q = sdev->request_queue; >>> DECLARE_COMPLETION_ONSTACK(done); >>> unsigned long timeleft; >>> struct scsi_eh_save ses; >>> - int rtn; >>> + int rtn = FAILED; >>> + >>> + spin_lock_irq(q->queue_lock); >>> + if (blk_queue_dead(q)) >>> + goto out_unlock; >>> + q->rq.count[BLK_RW_SYNC]++; >>> + spin_unlock_irq(q->queue_lock); >> >> Are you hitting a case where a scsi_cmnd does not have a request struct >> that was allocated through the block layer functions like >> blk_get_request, but is getting sent through this path? What code is >> doing this? >> >> Or, are you hitting a bug where somehow the request is freed (so the >> rq.count is decremented) but the scsi eh is still working on a scsi_cmnd >> that had a request struct allocated for it? > > > I haven't hit any such bugs. This patch is what I came up with after > analyzing what would be necessary to make sure that queuecommand isn't > called anymore after blk_cleanup_queue() finished and also to make sure > that blk_drain_queue() waits until all active queuecommand calls have It should be waiting now if the scsi_cmnd has a request backing shouldn't it? We will allocate a request struct with blk_get_request or one of the other blk helpers for each scsi_cmnd, and that will increment the q->rq.count. If we then go down the error path because a cmd timed out or because scsi_decide_disposition returned FAILED, then we will still have that request backing the scsi cmnd and the count should still be incremented for it. When we call scsi_send_eh_cmnd for eh operations the request is then still there and not freed yet. The request will get freed later when scsi_eh_flush_done_q is called. In there we will either retry or call scsi_finish_command which will go through the normal completion process and eventually call __blk_put_request and freed_request. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html