Re: [RFC 8/9] UAS: Cancel pending URBs and free streams in pre-reset.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 08, 2011 at 01:05:26PM -0500, Alan Stern wrote:
> On Wed, 7 Dec 2011, Sarah Sharp wrote:
> > The note also seems to imply that since there's only one thread that's
> > calling into the error handling methods, only one of the methods will be
> > called at a time.
> 
> No, that's not right.  There is only one error handler thread, but 
> other threads can call those methods.  For example, a user program can 
> ask the SCSI-generic driver to do a device, target, bus, or host reset.

Ah, ok, then the various abort methods can run at the same time, and a
host reset request may not necessarily mean that a particular command
failed to be aborted.  Now I understand why you said device reset
doesn't necessarily mean something is stuck in the command queue, it
could just be userspace resetting the device.

> > Your suggestion about how to synchronize the resetting flag involved
> > using the SCSI host lock.  But if the lock is taken by the SCSI core
> > before it calls eh_host_reset_handler and then another driver attempts
> > to reset the device and pre-reset tries to take the lock to sync the
> > resetting flag, then we have a deadlock.
> 
> Nope, the lock isn't held.  And of course, you don't have to have to 
> use the host lock if you don't want to.  Just put a private lock in 
> your devinfo structure.  You already have a lock for the workqueue 
> anyway.
> 
> (BTW, why is the devinfo structure allocated separately?  Is there any 
> reason not to embed it in the Scsi_Host structure, the way usb-storage 
> does?)

That was in Matthew's original patchset, so I'm not sure why.  I helped
out on the USB portions with the original driver, not the UAS/SCSI bits.

> > > >  It would be best if you wrote this patch yourself.
> > > > 
> > > > There's nothing strictly speaking wrong with the original patchset,
> > > > aside from the race condition between the abort handler and
> > > > uas_stat_cmplt, is there?
> > > 
> > > I don't know; I didn't look at the patches in any detail.
> > 
> > Ok, you didn't look at the whole patchset.  Did you look at this patch
> > in detail?
> 
> Probably not enough detail.  The description didn't seem right, so 
> there wasn't much point poring through the code.
> 
> On the whole it appears more or less okay.  There are a few small
> things.  For example, the srcu_struct should be embedded in devinfo
> rather than allocated dynamically.

Sure, I'll see if that will fit in devinfo.

> > What will the SCSI layer do if it tries to cancel a SCSI command, the
> > abort fails, but then the scsi_done function gets called with a
> > successful status?  I don't know, because I'm not a SCSI expert.
> 
> Once a command times out, scsi_done doesn't do anything.  I think you
> don't want the abort routine to return failure when a reset is in
> progress -- if it does, the SCSI error handler will start its own
> device reset.  (In fact, it looks like usb-storage returns the wrong
> value here.)

I see.

> Just wait until the command completes, one way or another, and return
> success.  Or maybe return FAST_IO_FAIL, which doesn't appear to be
> documented anywhere but is used in the SCSI error handler.

That could make things a bit complicated between pre-reset and the abort
handler.  Would it be unwise to make the abort handler wait on the reset
flag?  The commands will certainly be done by that point.  Would waiting
too long in the abort handler cause the error handling for other SCSI
devices to be held up for too long, since there's only one error
handling thread?

> > From a UAS perspective, I think we're fine.  If the completion handler
> > is running at the same time as the abort handler, we'll issue an ABORT
> > TASK command for that tag.  If the device has completed the tag, it will
> > just respond with an error status to the ABORT TASK and continue on its
> > way.  We're in abort handler and that note says we won't see new
> > commands issued while we're in an error handling function.  That means
> > even if the tag completes and the UAS driver marks it as being free for
> > new commands, we won't issue a new command with that tag until the abort
> > handler returns.  So the UAS driver can't ever accidentally abort a tag
> > it didn't mean to abort.
> > 
> > If the ABORT TASK does return successfully, then we know the completion
> > handler for the status URB is never going to run, and we just kill them.
> 
> (If you kill an URB, its completion handler does nevertheless run.)
>
> The only thing to watch out for would be if the completion handler gets
> rid of some data structure while the abort handler is still using it.  
> If that's not an issue in uas then there's nothing to worry about.

I expected that the completion handler would run for a killed URB, and
made sure the handlers would free the URBs and their buffers regardless
of the URB status.  The abort task handler only frees URBs that were
never submitted (the ones that were allocated but sitting in the
workqueue).  So I think we should be fine.

> > (BTW, I realized how awful it sounds to talk about killing URBs in
> > public when I was trying to explain this issue with my husband in a
> > restaurant last night.  I said "I just don't care about them, so we'll
> > kill them off" a little too loud.)
> 
> Did you get any funny looks from other people?  :-)

I busied myself with eating sushi and tried not to notice if I did. :)

> > > What you should do is break the problem into two pieces.  Work on 
> > > aborting first, and then add resetting.  I suspect aborting may be more 
> > > complex then resetting.
> > > 
> > > For aborting, the races to worry about are abort/resubmit and
> > > abort/complete.  As mentioned above, abort/resubmit could be handled
> > > via SRCU but abort/complete may need some sort of lock.
> 
> In short, the abort handler probably shouldn't return until the command
> has completed, either successfully or unsuccessfully.  (My personal
> impression is that the design of the SCSI and block layers is far too
> baroque; parts of it -- like this one -- ought to be simplified.)
> 
> > Is there any problem with using two SRCU reader locks in the same
> > driver?
> 
> No.
> 
> >  Can a function take an SRCU reader lock (say for the resetting
> > flag) and also synchronize the readers for another flag (say the command
> > abort flag)?
> 
> Yes.  But it mustn't call synchronize_srcu() if it's holding the 
> corresponding read lock, obviously.

Ok, that gives me some more options then.

> > Faced with this sort of race condition, and a lack of knowledge about
> > how the SCSI layers work with the scsi_host lock, this is where I start
> > walking in mental circles, trying to devise a solution that doesn't
> > involve the original RCU synchronization between the pre-reset function
> > and the abort handler.
> 
> I hope things are a little clearer now.

Yes, much clearer, thank you.  I'll have to think about this for a bit
before I come up with a better patchset.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux