On Sun, Apr 30, 2023 at 12:39:12PM -0700, Bart Van Assche wrote: > On 4/26/23 12:20, Alan Stern wrote: > > > This patch fixes the issue, not sure if it's correct: > > > > > > --- a/drivers/usb/storage/scsiglue.c > > > +++ b/drivers/usb/storage/scsiglue.c > > > @@ -455,6 +455,9 @@ static int device_reset(struct scsi_cmnd *srb) > > > usb_stor_dbg(us, "%s called\n", __func__); > > > + if (us->srb == srb) > > > + command_abort(srb); > > > + > > > /* lock the device pointers and do the reset */ > > > mutex_lock(&(us->dev_mutex)); > > > result = us->transport_reset(us); > > > > Maybe... But it would be better to check first whether the SCSI core is > > supposed to be reusing an active srb in this way. > > > > Martin, can tell us what is supposed to happen here? Is the > > eh_device_reset_handler routine supposed to be called with a scsi_cmnd > > for a currently active command? > > Hi Alan, > > I'm not aware of any other .eh_device_reset_handler implementation that > aborts the command that is passed to this callback before it aborts other > SCSI commands. That's not a good description of what's happening. For one thing, usb-storage can process only one command at a time, so the "before it aborts other SCSI commands" part doesn't apply. In fact, the proposed patch aborts the current command (which happens to be the same as the one passed to the .eh_device_reset_handler routine) before beginning a reset. > However, I'm not aware of an equivalent of us_data.dev_mutex > in other SCSI LLDs either. Maybe this deadlock is specific to the USB > storage handler? The deadlock, if that's what it is, is related to the fact that usb-storage does only one thing at a time. The mutex is there merely to enforce this; removing it wouldn't help matters since the driver still wouldn't be capable of doing more than one thing at a time. The underlying unstated issue is that usb-storage doesn't have any timeouts of its own. If a device fails to complete a command properly and the SCSI error handler doesn't abort the command, then the command will never finish and the driver will hang. (Until the device is hot-unplugged, of course.) I suppose a bus reset would probably cause an ongoing command to fail. A device reset almost certainly would not, since most device implementations of the USB bulk-only mass-storage protocol do a very bad job of handling device resets. Either way, it's a lot cleaner simply to abort the command before starting the reset. If usb-storage can't depend on the SCSI error handler to do this for us, it will have to do so itself. Alan Stern > Martin, feel free to correct me if I got anything wrong. > > Bart.