Re: [RFC 8/9] UAS: Cancel pending URBs and free streams in pre-reset.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 02, 2011 at 04:35:08PM -0500, Alan Stern wrote:
> On Fri, 2 Dec 2011, Sarah Sharp wrote:
> 
> > When a UAS device stops responding to commands, it may be necessary to
> > issue a USB device reset (which will reset all device state, including
> > the device configuration and alternate interface settings).  All pending
> > URBs for the device must be canceled and any bulk streams rings that
> > have been allocated for the device must be freed before the reset is
> > issued.
> > 
> > The UAS driver needs to flush any pending URBs in the pre_reset method,
> > which means it needs to do four things:
> > 
> > 1. It needs to get the function that queues SCSI commands
> > (uas_queuecommand_lck) to stop submitting URBs.
> > 
> > 2. It needs to get the completion handler for the status URB
> > (uas_stat_cmplt) to stop submitting data URBs in the USB 2.0 UAS case.
> > 
> > 3. It also needs to make sure that the function that aborts the SCSI
> > commands (uas_eh_abort_handler) isn't in the middle of aborting the same
> > command that the pre_reset method is trying to flush.
> 
> Bear in mind that pre_reset isn't necessarily trying to flush any 
> commands.  It just gets called from time to time.  Maybe because the 
> device is stuck and needs to be reset; maybe because some other driver 
> for a different interface on the same device needs to reset it.
>
> You do need to decide what to do about commands that are in flight.  
> Ideally you would wait until they all complete by themselves (or are
> aborted by the error handler, which amounts to the same thing).  
> Actively cancelling their URBs in preparation for the reset is not as
> safe, although it is necessary when things time out or aborts fail.

It's true that another interface driver could want to reset the USB
device, not just the UAS driver.  But is there a way to tell the
difference without adding code to the UAS driver?

If the UAS driver is trying to reset the device, then it probably makes
sense to flush/kill everything rather than simply wait for it to
complete.  At that point, the device will have already refused to
respond to an abort task IU for a queued SCSI command.  The devices I've
seen that need a reset just seem wedged, and no other commands complete
on other stream IDs.

I'm also not sure we can just wait for the commands to complete.  Since
the device chooses which streams to request data for (and thus which
SCSI commands to service), it's entirely possible the device could just
starve a particular command indefinitely due to some fireware bug.  So
another driver asks for a reset, we let the commands complete on their
own (while not submitting new commands), the starved SCSI command
eventually times out and the abort task fails, and then the UAS driver
asks for a reset.  What happens then?

> > 4. It needs to ensure that the work queue that handles partially
> > submitted SCSI commands (uas_work) isn't trying to resubmit URBs or
> > reschedule itself.
> > 
> > The solution is to introduce a new atomic_t, resetting.  When the
> 
> Why atomic_t?  Is the flag going to be written from multiple threads or
> contexts?

I think I used atomic_t because it has the memory barrier semantics I
need to make SRCU work.  I'll have to dig up my notes on that.

> > resetting flag is set, uas_queuecommand_lck() should tell the SCSI layer
> > that the device is busy for any new commands, uas_stat_cmplt() will tell
> > the SCSI layer the command aborted without submitting the data URBs, and
> 
> This is confusing.  What you want to do is plug your queue.  That is, 
> stop sending out new commands, while waiting for the existing ones to 
> complete.  Is that what you meant?  (I'm not sure exactly what 
> uas_stat_cmplt's job is.)

With USB 2.0 devices, we submit one pair of URBs (the command phase and
status phase), and then submit a second pair of URBs (the data phase and
status phase) in the first status phase's URB completion handler
(uas_stat_cmplt).  If the device is about to be reset because the UAS
driver requested it, I'm not sure it makes sense to submit the second
round of URBs.  I suppose we could let them go through if a different
driver was requesting the reset, but we'd need a flag or something
similar to signal the UAS driver was the requester.

> > uas_eh_abort_handler() should tell the SCSI layer that the abort failed.
> 
> This should happen only for new abort requests arriving after you know 
> you are going to do a reset.  Aborts that are already in progress 
> should be handled normally.

Yes, that was what I had in mind.

> One thing to watch out for is what you do when an abort fails.  If 
> you're not already waiting to do a reset, you might want to start one.

Can usb_kill_anchored_urb() fail?  If the ABORT TASK IU fails to get the
command out of the device command queue, we simply kill the URBs associated
with that command.  If usb_kill_anchored_urb() can't fail, we'll be
fine.  (I think it could only fail if the xHCI host controller failed to
complete the stop endpoint command, at which point the host controller
is just hosed and we have bigger issues.)

> > They can continue normally once pre_reset has re-setup the alternate
> > interface setting, reallocated the stream rings, cleared the resetting
> > flag.
> > 
> > We use SRCU to synchronize between the readers of the resetting flag
> > (uas_queuecommand_lck, uas_xfer_data, and uas_eh_abort_handler) and the
> > writers of the flag (uas_pre_reset and uas_post_reset).  We have to use
> > "sleepable" RCU instead of RCU because the readers must be able to sleep
> > while waiting for URBs to be killed, while holding the reader lock.
> 
> It's not clear that this is the best approach.  You already own the
> host lock in uas_queuecommand_lck; why not use that for
> synchronization?

The host lock isn't taken in the UAS workqueue (which attempts to
resubmit URBs for SCSI commands that initially failed due to out of
memory or errors from usb_submit_urb).  It also isn't taken in the
status phase handler (uas_stat_cmplt) that resubmits the second pair of
URBs for USB 2.0 UAS devices.  I'm not sure if the SCSI host lock can be
taken in interrupt context?

Without knowing about the host lock, I first started to design a
solution using a mutex and a reset flag.  It needed to be a mutex so
that the driver could hold it across killing anchored URBs in the abort
task callback.  Otherwise the pre-reset function could not assume that
the work queue would never be scheduled again.  (Most of the
complications stem from the workqueue.)  However, the original design
wouldn't have worked because I can't take a mutex in the status URB
completion handler.

Plus, it just didn't feel like the right fit for the problem.  It
included an empty lock/unlock pair in the pre_reset function to
synchronize with the two reset flag readers, so they would start
rejecting new abort tasks and new queued commands.  That made it feel
unnatural, like I wasn't using the right kind of locking.

RCU felt like a fit because the reset flag readers who were queueing the
commands, aborting commands, and attempting to resubmit partially failed
commands were part of the fast path of the UAS driver.  Using RCU would
allow a new command to be submitted while an old one was aborted because
it allows multiple readers.  The reset flag writer functions (pre and
post reset) aren't expected to be called very often, so it's fine that
the writer path will be slow.

Here, if you can read my atrocious writing, you can see the two plans
side by side:

https://picasaweb.google.com/lh/photo/rCW18DZ-NC8y_JPXKB8NV9MTjNZETYmyPJy0liipFm0?feat=directlink
https://picasaweb.google.com/lh/photo/D6Y_DwaFd0U0i2-n_UTIwdMTjNZETYmyPJy0liipFm0?feat=directlink

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux