Re: [RFC 8/9] UAS: Cancel pending URBs and free streams in pre-reset.

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Tue, 6 Dec 2011 10:54:59 -0800

On Mon, Dec 05, 2011 at 05:14:57PM -0500, Alan Stern wrote:
> On Mon, 5 Dec 2011, Sarah Sharp wrote:
> > > In that case the other driver would have to wait for the timeout.  You
> > > could improve the situation a little when this occurs by giving the
> > > existing commands only a short time in which to complete, after which
> > > you try to abort them.  The abort would also need to have a short
> > > timeout.
> > 
> > Ok, so for each anchor, I would call usb_wait_anchor_empty_timeout()
> > with a short timeout, then usb_kill_anchored_urbs()?
> > 
> > BTW, how short is short?  There could be up to 256 commands queued and
> > we don't want the user to wait for, say, more than a second, so I don't
> > think the timeout should be bigger than about 3ms?
> 
> You could use a single timeout of 500 ms (or something like that) for
> all the queued commands in parallel.  Anything that hasn't finished by 
> then gets aborted.

Sure, makes sense.

> > > What happens when you decide to cancel a command that has some URBs 
> > > sitting on the workqueue's queue?  How do you prevent the workqueue 
> > > from submitting them?
> > 
> > We take the workqueue lock and remove them from the work queue.  If the
> > workqueue has already re-submitted the URBs and removed them from its
> > own workqueue, then we'll just wait on them with
> > usb_kill_anchored_urbs().
> > 
> > But it's not so simple in the reset case.  We want to stop the work
> > queue completely and kill any pending URBs, so we have to sync with all
> > the functions that might schedule the work queue, and prevent any new
> > URBs from being submitted.
> 
> But the workqueue gets scheduled from only a few places, right?  So
> once you have prevented new commands from reaching those places
> (primarily the queuecommand routine -- I don't think commands can come
> from anywhere else), then all the workqueue needs is the usual abort
> processing.
> 
> In other words, once the synchronization for abort is done properly, it
> turns out that pre-reset needs only to synchronize with queuecommand.  
> And that requires just the host lock.

Ok, sure, we could use the host lock to synchronize the resetting flag,
if uas_queuecommand_lck is the only one reading it.

> > I think the race condition you saw would go something like this:
> > 
> >  - SCSI/block layer attempts to cancel a command for a USB 2.0 UAS
> >    device, and uas_eh_abort_handler() gets past issuing the abort task,
> >    and killing any anchored URBs.
> >  - In interrupt, the status command completes, submits the new data and
> >    status URBs, and returns.
> >  - uas_eh_abort_handler() returns and lets the SCSI core know the task
> >    was successfully aborted.
> >  - Later the two URBs return, and we attempt to complete a command the
> >    SCSI/block layer already returned as canceled.
> 
> More or less.  There are other similar races, such as between sending
> the data & status URBs vs. sending ABORT-TASK URBs.  The real question
> is when you decide that a command has completed, and I don't remember
> the details of the protocol well enough to say much more than that.
> 
> Anyway, if you fix these races you'll find that you have taken care of
> the pre-reset synchronization as well.
> 
> > I think that could be worked around by adding an atomic_t or using
> > set_bit for a flag in the cmnd_info structure.  I had noticed the rather
> > long comment above usb_stor_blocking_completion() in usb-storage, and
> > had intended to do something similar with the UAS driver.  If you have
> > any advice about how to work around this issue, it would be appreciated.
> 
> To tell the truth, usb-storage isn't a great model here.  It works, but
> I did most of my work on it before I was fully aware of all the issues
> involved in synchronizing multiple threads of control.  I wouldn't be
> at all suprised to find there are places where usb-storage is missing
> memory barriers or something of the sort.  Also, usb-storage handles
> only one command at a time, so it doesn't have to face the same
> complexity as UAS.
> 
> Since aborts are uncommon and you don't want to clutter up the fast
> path, it makes sense to use some form of RCU to synchronize aborts with
> uas_stat_cmplt().  Once that's done, the pre-reset routine merely has
> to abort all the commands currently in flight; the abort routine will
> take care of the synchronization.

If we can use the SCSI lock to safely set the resetting flag, and stop
uas_queuecommand_lck from submitting new commands until the bus reset
completes, then I think I can see where you're going with the abort
synchronization.

As you said, we want to abort any commands in flight, which means
stopping uas_do_work() from resubmitting partially queued URBs, and
stopping the status URB completion from submitting more USB 2.0 URBs in
uas_stat_cmplt.  Those would be RCU readers, looking at a flag in
uas_cmd_info, possibly a new "aborted" flag in the state variable.

The RCU writers would be uas_eh_abort_handler and
uas_pre_reset.  I know that you said the pre-reset function should just
call the abort handler and let it take care of the synchronization.
However, the abort handler needs the scsi_cmnd pointer, and right now
there's no list of outstanding commands.  The UAS driver was really a
fire-and-forget driver until I added the URB anchors.  So to be able to
call the abort handler, we'd need a list of outstanding commands, but
walking that list would race with the status completion function
that would take the finished commands off the list, and then we open a
whole new can of worms.

I'd have to think about this further to see if there's an easier way of
keeping track of commands.  I'm not even sure if there's enough room in
uas_cmd_info for a new list_head.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html