On Thu, 2017-03-09 at 18:37 +0200, Israel Rukshin wrote: > The bug reproduce when unloading srp module with one port down. > sd_shutdown() hangs when __scsi_remove_device() get scsi_device with > state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE. > It hangs because sd_shutdown() is trying to send sync cache command > when the device is offline but with SDEV_CANCEL status. > The status was changed to SDEV_CANCEL by __scsi_remove_device() > before it calls to device_del(). > > The block layer timeout mechanism doesn't cause the SYNCHRONIZE CACHE > command to fail after the timeout expired because the request timer > wasn't started. > blk_peek_request() that is called from scsi_request_fn() didn't return > this request and therefore the request timer didn't start. > > This commit doesn't accept new commands if the original state was offline. > > The bug was revealed after commit cff549 ("scsi: proper state checking > and module refcount handling in scsi_device_get"). > After this commit scsi_device_get() returns error if the device state > is SDEV_CANCEL. > This eventually leads SRP fast I/O failure timeout handler not to clean > the sync cache command because scsi_target_unblock() skip the canceled device. > If this timeout handler is set to infinity then the hang remains forever > also before commit cff549. How could blk_peek_request() not return a request that has not yet been started? How could a patch that changes scsi_device_get() affect I/O since scsi_device_get() is not called from the I/O path? Anyway, could you try to reproduce the hang with the patch below applied and see whether the output produced by this patch helps to determine what is going on? Thanks, Bart. diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ba2286652ff6..855548ff4c4d 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -3018,8 +3018,10 @@ scsi_internal_device_unblock(struct scsi_device *sdev, else sdev->sdev_state = SDEV_CREATED; } else if (sdev->sdev_state != SDEV_CANCEL && - sdev->sdev_state != SDEV_OFFLINE) + sdev->sdev_state != SDEV_OFFLINE) { + WARN_ONCE(true, "sdev state = %d\n", sdev->sdev_state); return -EINVAL; + } if (q->mq_ops) { blk_mq_start_stopped_hw_queues(q, false); diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 82dfe07b1d47..35aa6b37e199 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1289,6 +1289,13 @@ void __scsi_remove_device(struct scsi_device *sdev) device_unregister(&sdev->sdev_dev); transport_remove_device(dev); scsi_dh_remove_device(sdev); + + WARN_ON_ONCE(blk_queue_stopped(sdev->request_queue)); + sdev_printk(KERN_INFO, sdev, + "%s: device_busy = %d device_blocked = %d\n", + __func__, atomic_read(&sdev->device_busy), + atomic_read(&sdev->device_blocked)); + device_del(dev); } else put_device(&sdev->sdev_dev); -- 2.12.0