Am Dienstag, den 14.01.2020, 10:04 -0500 schrieb Alan Stern: > On Tue, 14 Jan 2020, Oliver Neukum wrote: > > > Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu: > > > Oliver Neukum wrote: > > > > > > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu: > > > > > > > > Isn't that the bug? A command to a detached device should fail. > > > > Could you please elaborate? This issue would not be limited to uas. > > > > > > > > > > In the case I mentioned, the hub thread of external hub running > > > uas_probe() will get stuck waiting for the completion of scsi scan. > > > > > > The scsi scan will try to probe a single LUN using a SCSI INQUIRY. > > > If the external hub has been unplugged before LUN probe, the device > > > state of uas device will be set to USB_STATE_NOTATTACHED by the > > > root hub thread. So, all the following calls to usb_submit_urb() in > > > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() > > > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn(). > > > > And that looks like the root cause. The queue isn't busy. > > It is dead. > > No. The discussion has gotten a little confused. EJ's point is that > if SCSI scanning takes place in the context of the hub worker thread, > then that thread won't be available to process a disconnect > notification. The device will be unplugged, but the kernel won't > realize it until the SCSI scanning is finished. OK, I think we have two possible code paths at least First: uas_queuecommand_lck() -> uas_submit_urbs() -> -ENODEV -> return SCSI_MLQUEUE_DEVICE_BUSY That looks wrong to me. Second: The submission actually works and we eventually terminate the URB with an error. In that case nothing happens and eventually SCSI core retries, times out and does error handling. > > > scsi_request_fn() then puts this scsi command back into request queue. > > > Because this scsi device is just created and during LUN probe process, > > > this scsi command is the only one in the request queue. So, it will be picked > > > up soon and dispatched to uas driver again. This cycle will continue until > > > uas_disconnect() is called and its "resetting" flag is set. However, the > > > hub thread of external hub still got stuck waiting for the completion of > > > this scsi command, and may not be able to run uas_disconnect(). > > > A deadlock happened. > > > > I see. But we are working around insufficient error reporting in the > > SCSI midlayer. > > No, the error reporting there is correct. URBs will complete with > errors like -EPROTO but no other indication that the device is gone, so > the midlayer believes that a retry is appropriate. > > Perhaps uas should treat -EPROTO, -EILSEQ, and -ETIME as fatal errors. They could happen due to bad cables. Now, another work queue would solve the second error case, but I think the first error case still exists and we would paper over it. Regards Oliver