Re: [PATCH] usb: uas: fix a plug & unplug racing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 14 Jan 2020, Oliver Neukum wrote:

> Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu:
> > Oliver Neukum wrote:
> > 
> > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu:
> > > 
> > > Isn't that the bug? A command to a detached device should fail.
> > > Could you please elaborate? This issue would not be limited to uas.
> > > 
> > 
> > In the case I mentioned, the hub thread of external hub running 
> > uas_probe() will get stuck waiting for the completion of scsi scan. 
> > 
> > The scsi scan will try to probe a single LUN using a SCSI INQUIRY.
> > If the external hub has been unplugged before LUN probe, the device 
> > state of uas device will be set to USB_STATE_NOTATTACHED by the 
> > root hub thread. So, all the following calls to usb_submit_urb() in 
> > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() 
> > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn().
> 
> And that looks like the root cause. The queue isn't busy.
> It is dead.

No.  The discussion has gotten a little confused.  EJ's point is that
if SCSI scanning takes place in the context of the hub worker thread,
then that thread won't be available to process a disconnect
notification.  The device will be unplugged, but the kernel won't 
realize it until the SCSI scanning is finished.

> > scsi_request_fn() then puts this scsi command back into request queue.
> > Because this scsi device is just created and during LUN probe process, 
> > this scsi command is the only one in the request queue. So, it will be picked
> > up soon and dispatched to uas driver again. This cycle will continue until
> > uas_disconnect() is called and its "resetting" flag is set. However, the 
> > hub thread of external hub still got stuck waiting for the completion of
> > this scsi command, and may not be able to run uas_disconnect(). 
> > A deadlock happened.
> 
> I see. But we are working around insufficient error reporting in the
> SCSI midlayer.

No, the error reporting there is correct.  URBs will complete with
errors like -EPROTO but no other indication that the device is gone, so
the midlayer believes that a retry is appropriate.

Perhaps uas should treat -EPROTO, -EILSEQ, and -ETIME as fatal errors.

Alan Stern




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux