Re: [PATCH] usb: uas: fix a plug & unplug racing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 15 Jan 2020, Oliver Neukum wrote:

> Am Dienstag, den 14.01.2020, 10:04 -0500 schrieb Alan Stern:
> > On Tue, 14 Jan 2020, Oliver Neukum wrote:
> > 
> > > Am Dienstag, den 14.01.2020, 03:28 +0000 schrieb EJ Hsu:
> > > > Oliver Neukum wrote:
> > > > 
> > > > > Am Sonntag, den 12.01.2020, 19:30 -0800 schrieb EJ Hsu:
> > > > > 
> > > > > Isn't that the bug? A command to a detached device should fail.
> > > > > Could you please elaborate? This issue would not be limited to uas.
> > > > > 
> > > > 
> > > > In the case I mentioned, the hub thread of external hub running 
> > > > uas_probe() will get stuck waiting for the completion of scsi scan. 
> > > > 
> > > > The scsi scan will try to probe a single LUN using a SCSI INQUIRY.
> > > > If the external hub has been unplugged before LUN probe, the device 
> > > > state of uas device will be set to USB_STATE_NOTATTACHED by the 
> > > > root hub thread. So, all the following calls to usb_submit_urb() in 
> > > > uas driver will return -NODEV, and accordingly uas_queuecommand_lck() 
> > > > will return SCSI_MLQUEUE_DEVICE_BUSY to scsi_request_fn().
> > > 
> > > And that looks like the root cause. The queue isn't busy.
> > > It is dead.
> > 
> > No.  The discussion has gotten a little confused.  EJ's point is that
> > if SCSI scanning takes place in the context of the hub worker thread,
> > then that thread won't be available to process a disconnect
> > notification.  The device will be unplugged, but the kernel won't 
> > realize it until the SCSI scanning is finished.
> 
> OK, I think we have two possible code paths at least
> 
> First:
> uas_queuecommand_lck() -> uas_submit_urbs() -> -ENODEV -> return
> SCSI_MLQUEUE_DEVICE_BUSY
> 
> That looks wrong to me.

Agreed.  Although it isn't the subject of this patch.

> Second:
> The submission actually works and we eventually terminate the URB
> with an error. In that case nothing happens and eventually SCSI core
> retries, times out and does error handling. 
> 
> > > > scsi_request_fn() then puts this scsi command back into request queue.
> > > > Because this scsi device is just created and during LUN probe process, 
> > > > this scsi command is the only one in the request queue. So, it will be picked
> > > > up soon and dispatched to uas driver again. This cycle will continue until
> > > > uas_disconnect() is called and its "resetting" flag is set. However, the 
> > > > hub thread of external hub still got stuck waiting for the completion of
> > > > this scsi command, and may not be able to run uas_disconnect(). 
> > > > A deadlock happened.
> > > 
> > > I see. But we are working around insufficient error reporting in the
> > > SCSI midlayer.
> > 
> > No, the error reporting there is correct.  URBs will complete with
> > errors like -EPROTO but no other indication that the device is gone, so
> > the midlayer believes that a retry is appropriate.
> > 
> > Perhaps uas should treat -EPROTO, -EILSEQ, and -ETIME as fatal errors.
> 
> They could happen due to bad cables.
> 
> Now, another work queue would solve the second error case, but I think
> the first error case still exists and we would paper over it.

Then you do agree with this patch, and you or EJ ought to create 
another patch to handle the first error case.

Alan Stern




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux