Re: Logic errors involving QH_STATE_COMPLETING

David Brownell <david-b@xxxxxxxxxxx> · Wed, 22 Jul 2009 11:16:37 -0700

On Tuesday 21 July 2009, Alan Stern wrote:
> Dave:
> 
> It seems there are some errors in the way ehci-hcd handles
> QH_STATE_COMPLETING.  They are largely due to the fact that COMPLETING
> isn't a real state -- URBs can be given back either while a QH is
> linked or while it is unlinked (or even while it is in transition).

It's more of a transient lock, making it so only qh_completions()
may modify the QH even if ehci->lock was dropped ... because it's
got to drop that lock while giving URBs back through usbcore.

I do have a note in my patches to revisit the case of unlink_async()
seeing a QH in that state, which is one transition you ask about.

> For example, suppose the completion routine for a bulk or control URB 
> unlinks a later URB queued for the same endpoint.  Then 
> ehci_urb_dequeue will see that qh->qh_state is QH_STATE_COMPLETING, so 
> it will call unlink_async.  But when unlink_async sees that the state 
> isn't QH_STATE_LINKED, it will return almost immediately without doing 
> anything.  Thus qh never does get unlinked.

Maybe it does -- if the URB completed naturally, or the QH is either
unlinked or halted *and* qh_completions() hasn't already scanned that
particular URB.  But maybe it doesn't.  

What's needed is a way to *force* the qh_completions() logic to either
(a) start a QH unlink, if it would restore to QH_STATE_LINKED and the
QH wasn't halted, else (b) do an extra scan, in case it already looked
at that URB on that scan.

> Or suppose the completion routine for an interrupt URB unlinks a later
> URB queued for the same endpoint.  The "switch (qh->qh_state)"  
> statement in ehci_urb_dequeue will end up in the "default" case and
> will log a debugging message without doing anything else.

Good point; that's a pretty ugly bit of code... just like all the
code handling unlinks from the EHCI periodic schedule.

> Finally, qh_completions can be called from several different pathways.  
> It wouldn't be surprising to see it get called recursively on occasion.

I'd be surprised.  :)

If that happens, it's a bug.

> And since it doesn't test for QH_STATE_COMPLETING at the start, it 
> would run recursively, thereby messing up the list_for_each_safe() loop 
> through the qTDs.

... and potentially trashing the QH context, which can be a bit
fragile.  If you're concerned about that, just check and warn.

> Maybe instead of using QH_STATE_COMPLETING, a QH should contain a flag
> to indicate that its queue is being scanned and another flag to 
> indicate that a rescan will be needed.  What do you think?

That could work.  Certainly the "full scan needed" flag is needed
to address the control/bulk issue.  And it might help with cleaner
code to unlink interrupt transfers.  I'd leave the pseudo-state in
place for now though, 

- Dave

> 
> Alan Stern
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html