Re: Logic errors involving QH_STATE_COMPLETING

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 22 July 2009, Alan Stern wrote:
> On Wed, 22 Jul 2009, David Brownell wrote:
> 
> > On Tuesday 21 July 2009, Alan Stern wrote:
> > > Dave:
> > > 
> > > It seems there are some errors in the way ehci-hcd handles
> > > QH_STATE_COMPLETING.  They are largely due to the fact that COMPLETING
> > > isn't a real state -- URBs can be given back either while a QH is
> > > linked or while it is unlinked (or even while it is in transition).
> > 
> > It's more of a transient lock, making it so only qh_completions()
> > may modify the QH even if ehci->lock was dropped ... because it's
> > got to drop that lock while giving URBs back through usbcore.
> 
> And yet if the completion routine submits or unlinks more URBs, we
> _will_ have to modify the QH.

Not always; or usually.  As a rule only the TD list changes.  The QH
only needs to change when that list is empty ... and such updates are
racey, which is why they're localized.  (One thing to watch out for
is that the HC runs from a *cached* copy of the first TD, which it
writes into the QH overlay.  It's not written back to the TD unless
that TD completes "normally".)


> There's no need to worry about the QH getting unlinked while an URB is
> being given back.  A comment in scan_async() notes the possibility and
> the code is prepared for it.
> 
> The real problem (or at least, part of the problem) is that the QH's 
> true state is obscured.  If other code sees QH_STATE_COMPLETING then it 
> can't tell whether the QH is really linked, unlinked, or in between.

It must not care.  There's only one place -- qh_completions() -- that
expects to address all the hardware races.

The issue you've noted is that there are some uncommon code paths where
it's not told about some work it needs to do.  So that work might not
get done.  The normal way to make that work happen involves side
effects of QH state changes.  

 
> > I do have a note in my patches to revisit the case of unlink_async()
> > seeing a QH in that state, which is one transition you ask about.
> 
> Right.  Since unlink_async() doesn't know whether the QH is truly
> linked or not, it doesn't know what to do.

It knows what it should do -- force the scan of the whole TD list,
which is a side effect of QH unlink -- but doesn't make that happen.

 
> > > For example, suppose the completion routine for a bulk or control URB 
> > > unlinks a later URB queued for the same endpoint.  Then 
> > > ehci_urb_dequeue will see that qh->qh_state is QH_STATE_COMPLETING, so 
> > > it will call unlink_async.  But when unlink_async sees that the state 
> > > isn't QH_STATE_LINKED, it will return almost immediately without doing 
> > > anything.  Thus qh never does get unlinked.
> > 
> > Maybe it does -- if the URB completed naturally, or the QH is either
> > unlinked or halted *and* qh_completions() hasn't already scanned that
> > particular URB.  But maybe it doesn't.  
> > 
> > What's needed is a way to *force* the qh_completions() logic to either
> > (a) start a QH unlink, if it would restore to QH_STATE_LINKED and the
> > QH wasn't halted, else (b) do an extra scan, in case it already looked
> > at that URB on that scan.
> 
> "(b) do an extra scan" sounds good.  These cases I'm concerned about
> are a little exceptional, so the overhead won't matter.

It needs to handle both cases.  The (b) case presumes that the QH
isn't live; but if it's completing, it's usually live.


> > > Finally, qh_completions can be called from several different pathways.  
> > > It wouldn't be surprising to see it get called recursively on occasion.
> > 
> > I'd be surprised.  :)
> > 
> > If that happens, it's a bug.
> 
> ehci_urb_dequeue() will call qh_completions() when an interrupt URB is 
> unlinked.  While that URB is being given back, an IRQ might occur and 
> scan_periodic() might then call qh_completions() for the same endpoint.

If so, bug.  But how would that IRQ processing get applied to a
QH that's unlinked from the schedule used by the IRQ processing?

 
> > > And since it doesn't test for QH_STATE_COMPLETING at the start, it 
> > > would run recursively, thereby messing up the list_for_each_safe() loop 
> > > through the qTDs.
> > 
> > ... and potentially trashing the QH context, which can be a bit
> > fragile.  If you're concerned about that, just check and warn.
> > 
> > 
> > > Maybe instead of using QH_STATE_COMPLETING, a QH should contain a flag
> > > to indicate that its queue is being scanned and another flag to 
> > > indicate that a rescan will be needed.  What do you think?
> > 
> > That could work.  Certainly the "full scan needed" flag is needed
> > to address the control/bulk issue.  And it might help with cleaner
> > code to unlink interrupt transfers.  I'd leave the pseudo-state in
> > place for now though, 
> 
> According to grep, the only place we look for COMPLETING is in the
> control/bulk case of ehci_urb_dequeue().  That needs to be changed
> anyway.

The interrupt transfer cases should be more intelligent, for example.

 
> Of course, there may be other, implicit uses -- i.e., where the code
> checks for other values and the comparisons fail because the value is
> COMPLETING.  There are a few places where that can happen, such as in
> unlink_async() and ehci_endpoint_reset().  As far as I can tell, they
> are all buggy. 

What's the scenario for endpoint reset goofing the toggle?


> So I think it would be better to do away with 
> QH_STATE_COMPLETING entirely.

How about just starting by fixing the control/bulk case, without
touching that particular bit of logic?  The mechanism to force
such a scan is needed regardless, and is orthogonal to the rest.

Reworking the periodic schedule scan/link/unlink support is a
whole nother ball of worms.  One that doesn't neccessarily need
to involve changes to the QH (or xITD) state machines.

- Dave


> 
> Alan Stern
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux