Re: [RFC 8/9] UAS: Cancel pending URBs and free streams in pre-reset.

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Wed, 7 Dec 2011 15:31:42 -0800

On Wed, Dec 07, 2011 at 01:13:42PM -0500, Alan Stern wrote:
> On Wed, 7 Dec 2011, Sarah Sharp wrote:
> 
> > > Here's one way to do it.  Have two new flags: one for reset-in-progress 
> > > and one for abort-in-progress.  If the reset-in-progress flag is set, 
> > > then uas_do_work and uas_stat_cmplt would know not to submit any URBs 
> > > at all.  If that flag isn't set but the other one is, they would know 
> > > to check whether the particular command they are working on at the 
> > > moment is being aborted.
> > > 
> > > Then all you need is some way to tell when all the outstanding commands
> > > have completed.  For example, an atomic counter of outstanding
> > > commands.  When the counter goes to 0, if the reset-in-progress flag is
> > > set then you signal the pre-reset thread that it may proceed (use a
> > > struct completion or something like that).
> > 
> > Ok, I've tried to wrap my head around this strategy all day yesterday,
> > and I can't see how it can be done without races.
> 
> What races?

A couple different ones, there's an example at the end of this email.

> >  I feel like I'm going
> > in mental circles, and missing information about how the SCSI layer
> > works.
> 
> What information?

I'm trying to figure out which functions in the UAS driver are entered
with the SCSI host lock taken by the SCSI core.  You've already said
that the function to queue commands will have the SCSI host lock held.
There's an interesting note in include/scsi/scsi_host.h:

	/*
	 * This is an error handling strategy routine.  You don't need to
	 * define one of these if you don't want to - there is a default
	 * routine that is present that should work in most cases.  For those
	 * driver authors that have the inclination and ability to write their
	 * own strategy routine, this is where it is specified.  Note - the
	 * strategy routine is *ALWAYS* run in the context of the kernel eh
	 * thread.  Thus you are guaranteed to *NOT* be in an interrupt
	 * handler when you execute this, and you are also guaranteed to
	 * *NOT* have any other commands being queued while you are in the
	 * strategy routine. When you return from this function, operations
	 * return to normal.
	 *
	 * See scsi_error.c scsi_unjam_host for additional comments about
	 * what this function should and should not be attempting to do.
	 *
	 * Status: REQUIRED	(at least one of them)
	 */
	int (* eh_abort_handler)(struct scsi_cmnd *);
	int (* eh_device_reset_handler)(struct scsi_cmnd *);
	int (* eh_target_reset_handler)(struct scsi_cmnd *);
	int (* eh_bus_reset_handler)(struct scsi_cmnd *);
	int (* eh_host_reset_handler)(struct scsi_cmnd *);

Does that note mean that the SCSI host lock is held when
eh_abort_handler and eh_host_reset_handler are called?

The note also seems to imply that since there's only one thread that's
calling into the error handling methods, only one of the methods will be
called at a time.  If the SCSI lock is held when those methods are
called, and only one of them is called at once, then that loosens the
constraints that I have to place in the UAS driver.

I asked James Bottomley on IRC about it yesterday, but he didn't answer.

Your suggestion about how to synchronize the resetting flag involved
using the SCSI host lock.  But if the lock is taken by the SCSI core
before it calls eh_host_reset_handler and then another driver attempts
to reset the device and pre-reset tries to take the lock to sync the
resetting flag, then we have a deadlock.

> >  It would be best if you wrote this patch yourself.
> > 
> > There's nothing strictly speaking wrong with the original patchset,
> > aside from the race condition between the abort handler and
> > uas_stat_cmplt, is there?
> 
> I don't know; I didn't look at the patches in any detail.

Ok, you didn't look at the whole patchset.  Did you look at this patch
in detail?

> >  You've expressed that you think it's a bit
> > heavy-handed, because it kills the commands on reset, but Oliver has
> > also expressed the opinion that when another driver wants a reset on a
> > composite device, it often means there's something wrong with the other
> > interface.
> > 
> > If we find out that it is an improvement to flush rather than kill
> > commands on reset, that patch can always be added later.  The patchset
> > as it was submitted is an improvement over the current code, since it
> > adds device error handling where there was none before.
> 
> Sure.  Killing everything at once is okay for now.  Don't worry about 
> that.

Ok. :)

> > The simplest solution (and the one I understand) to make the original
> > patchset race-free is to add an aborting flag to the uas_command_info
> > structure, make the abort handler set it before it takes the work queue
> > lock, and rearrange the function that is called by uas_stat_cmplt such
> > that it takes the uas_work_lock before submitting URBs and checks the
> > flag:
> > 
> > static int uas_xfer_data(struct urb *urb, struct scsi_cmnd *cmnd,
> >                 struct uas_dev_info *devinfo, unsigned direction)
> > {
> >         struct uas_cmd_info *cmdinfo = (void *)&cmnd->SCp;
> >         int err, srcu_idx;
> > 
> >         srcu_idx = srcu_read_lock(devinfo->srcu);
> >         if (atomic_read(&devinfo->resetting)) {
> >                 srcu_read_unlock(devinfo->srcu, srcu_idx);
> >                 return -ENODEV;
> >         }
> > 
> >         cmdinfo->state = direction | SUBMIT_STATUS_URB;
> > 	spin_lock(&uas_work_lock);
> > 	/* XXX: check the aborting flag here. */
> >         err = uas_submit_urbs(cmnd, cmnd->device->hostdata, GFP_ATOMIC);
> >         if (err) {
> >                 list_add_tail(&cmdinfo->list, &uas_work_list);
> >                 spin_unlock(&uas_work_lock);
> >                 schedule_work(&uas_work);
> >         } else {
> >                 spin_unlock(&uas_work_lock);
> > 	}
> >         srcu_read_unlock(devinfo->srcu, srcu_idx);
> >         return 0;
> > }
> > 
> > That does mean that the USB 2.0 UAS fast path will have to take a lock
> > on every submit, which is not ideal, but usb-storage also takes a mutex,
> > so it's a wash.  I'm not convinced the USB 2.0 UAS device will get much
> > better performance than BoT anyway.  If performance testing shows that
> > the spinlock is the bottleneck, I can re-look at the solution.
> 
> Aborting is tricky.  You have to synchronize it with both URB 
> submission and command completion.  SRCU can take care of the URB 
> submission part, but for command completion you'll need something 
> stronger.  The race to worry about is what happens when the error 
> handler gets invoked at the same time as the command's last URB 
> completes and you call scsi_done.  Maybe this race is harmless for uas, 
> I don't know.  But if you don't want to risk trying to abort a command 
> that has already completed, it may well turn out that a spinlock 
> is the only way to handle it.

What will the SCSI layer do if it tries to cancel a SCSI command, the
abort fails, but then the scsi_done function gets called with a
successful status?  I don't know, because I'm not a SCSI expert.

>From a UAS perspective, I think we're fine.  If the completion handler
is running at the same time as the abort handler, we'll issue an ABORT
TASK command for that tag.  If the device has completed the tag, it will
just respond with an error status to the ABORT TASK and continue on its
way.  We're in abort handler and that note says we won't see new
commands issued while we're in an error handling function.  That means
even if the tag completes and the UAS driver marks it as being free for
new commands, we won't issue a new command with that tag until the abort
handler returns.  So the UAS driver can't ever accidentally abort a tag
it didn't mean to abort.

If the ABORT TASK does return successfully, then we know the completion
handler for the status URB is never going to run, and we just kill them.

(BTW, I realized how awful it sounds to talk about killing URBs in
public when I was trying to explain this issue with my husband in a
restaurant last night.  I said "I just don't care about them, so we'll
kill them off" a little too loud.)

> > The good news is that the USB 3.0 UAS fast path does not include
> > uas_xfer_data, so it will only see the RCU readlock.  USB 3.0 UAS is
> > where I expect the performance of the code to matter the most.
> > 
> > So is there anything horrendously objectionable about closing the
> > abort/resubmit race condition with a spinlock and using the RCU
> > implementation from the original patch?
> 
> What you should do is break the problem into two pieces.  Work on 
> aborting first, and then add resetting.  I suspect aborting may be more 
> complex then resetting.
> 
> For aborting, the races to worry about are abort/resubmit and
> abort/complete.  As mentioned above, abort/resubmit could be handled
> via SRCU but abort/complete may need some sort of lock.

Is there any problem with using two SRCU reader locks in the same
driver?  Can a function take an SRCU reader lock (say for the resetting
flag) and also synchronize the readers for another flag (say the command
abort flag)?

> For resetting, the races to worry about are reset/queuecommand and
> waiting for all outstanding commands to complete.  The queuecommand
> race could be handled with either regular RCU or a spinlock.  Either a 
> private lock or the host lock (you know, uas_queuecommand doesn't 
> have to be called with the host lock held if you don't want it).

Do you mean I can ask the SCSI core to not hold the host lock when it
calls into the queue function?  Or is that always true and the driver
needs to grab the scsi host lock?

> Waiting for the outstanding commands to complete can be done without 
> locking if you don't mind wasting a little time in pre_reset.  All you 
> have to do is test an atomic commands_in_flight counter, and if it is 
> non-zero then sleep for a few tens of milliseconds and try again.

There's also a race condition between the pre-reset function and the
abort handler, since the abort handler can submit URBs to issue the
ABORT TASK command.

Say there's an atomic counter, and a resetting flag.

 1. The abort handler starts to run, and doesn't see the resetting flag,
    but gets scheduled before it can submit the ABORT TASK command and
    increment the number of outstanding commands.

 2. The completion handler for the status URB runs and decrements the
    count to zero.

 3. Meanwhile, another driver asks for a reset, and pre-reset is called,
    which finds the outstanding count zero.  Before it can tear down the
    stream rings, the context is swapped out for the abort handler thread.

 4. The abort handler thread submits the ABORT TASK command and waits on
    it.  At this point, if the pre-reset handler is swapped back in,
    it's going to happily free the stream rings while the abort task
    command is still on them.

Faced with this sort of race condition, and a lack of knowledge about
how the SCSI layers work with the scsi_host lock, this is where I start
walking in mental circles, trying to devise a solution that doesn't
involve the original RCU synchronization between the pre-reset function
and the abort handler.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html