On Tue, 2017-11-07 at 22:42 +0000, Bart Van Assche wrote: > On Tue, 2017-11-07 at 10:09 -0800, James Bottomley wrote: > > > > but can you investigate the root cause rather than trying this > > bandaid? > > Hello James, > > Thanks for your reply. I think that the root cause is that SCSI > scanning activity can continue to submit I/O even after > scsi_remove_host() has unlocked scan_mutex but that > scsi_remove_host() removes some of the infrastructure that is > essential to process SCSI requests. That's not really a useful answer: how does it submit I/O after the device goes into DEL? In theory every I/O submitted after this is returned with an immediate error. I could buy the fact that we have pending I/O submitted before we go into DEL, which would argue for some sort of quiesce wait, but I don't see how I/O submitted after DEL causes a hang. > Are you OK with > e.g. moving a significant part of scsi_remove_host() into > scsi_host_dev_release()? Well not really without seeing the root cause. Before scsi_forget_host ()it's all about state and after it's just removing some user visible host attributes, so I can't see how either matters much. scsi_forget_host() must be executed from scsi_remove_host() because that's how the devices go into the DEL state and how we error the requests without troubling the device driver, so that can't be moved to release James