Re: Logic errors involving QH_STATE_COMPLETING

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 22 July 2009, Alan Stern wrote:
> On Wed, 22 Jul 2009, David Brownell wrote:
> 
> > > And yet if the completion routine submits or unlinks more URBs, we
> > > _will_ have to modify the QH.
> > 
> > Not always; or usually.  As a rule only the TD list changes.  The QH
> > only needs to change when that list is empty ... and such updates are
> > racey, which is why they're localized.  (One thing to watch out for
> > is that the HC runs from a *cached* copy of the first TD, which it
> > writes into the QH overlay.  It's not written back to the TD unless
> > that TD completes "normally".)
> 
> While a QH might need all sort of complicated manipulations, right
> now I'm concerned only about linking and unlinking.

Exactly.  You can't escape those cases for any "live" QH.
Those "complicated manipulations" have to be the standard
case, or you will corrupt other data transfers.


> > > The real problem (or at least, part of the problem) is that the QH's 
> > > true state is obscured.  If other code sees QH_STATE_COMPLETING then it 
> > > can't tell whether the QH is really linked, unlinked, or in between.
> > 
> > It must not care.  There's only one place -- qh_completions() -- that
> > expects to address all the hardware races.
> 
> Strictly speaking, that's not true.  There's a whole set of
> interrelated routines (not to mention a special-purpose timer!) to
> handle the racy business of unlinking async QHs.

And they all rely on qh_completions() to make sure all the
state for the QH is handled properly ...

 
> > > Right.  Since unlink_async() doesn't know whether the QH is truly
> > > linked or not, it doesn't know what to do.
> > 
> > It knows what it should do -- force the scan of the whole TD list,
> > which is a side effect of QH unlink -- but doesn't make that happen.
> 
> No -- you're talking about what ehci_urb_dequeue() should do.

It expects unlink_async() to do that, by side effect.  :)


> If the QH is currently idle then ehci_urb_dequeue() doesn't need to
> call unlink_async() in the first place; it only needs to force a rescan
> (your (b) case below).

No; if it's currently idle it's also currently empty, so
nothing would need to be done.  (This should never happen,
since there'd be no URBs to unlink...) There's other code
ensuring that if the TD list is non-empty (even after scanning
for unlinks) when a QH unlink completes, it's immediately
relinked.


> It doesn't even need to do that if the QH is in 
> the middle of unlinking because, as you say, it will be scanned when
> the unlink finishes.

Right.  Also, that's not owned by qh_completions() so
neither (a) nor (b) applies.


> If the QH _is_ currently linked then ehci_urb_dequeue() needs to unlink
> it -- forcing a rescan isn't enough (your (a) case).  For this it does
> need to call unlink_async(), which does need to initiate the unlink.

Except for the (b) subcase dealing with a halted QH that's
still linked ...

 
> > > > What's needed is a way to *force* the qh_completions() logic to either
> > > > (a) start a QH unlink, if it would restore to QH_STATE_LINKED and the
> > > > QH wasn't halted, else (b) do an extra scan, in case it already looked
> > > > at that URB on that scan.
> > > 
> > > "(b) do an extra scan" sounds good.  These cases I'm concerned about
> > > are a little exceptional, so the overhead won't matter.
> > 
> > It needs to handle both cases.  The (b) case presumes that the QH
> > isn't live; but if it's completing, it's usually live.
> 
> Why does qh_completions() need to be forced into starting a QH unlink?

In (a), a QH unlink is required because the QH is live (HC is
scanning and updating it).  You can't just munge a TD list the
HC may be modifying!!

If you're asking why it should be in qh_completions() rather
than say scan_async(), that's a different issue.  In fact I
think scan_async() is a better spot; it's already got logic
to make that happen.


> What's wrong with just letting unlink_async() do its job, even if it
> gets called while an URB is being given back?

Because it doesn't know whether the QH is live/(a) or not/(b).

 
> Are you concerned about the possibility of unlinking a halted QH?  
> That can happen anyway, since a QH can halt while an unlink is in
> progress. 

So obviously I'm not concerned with that...


> Is there some other unintended interaction to watch out for? 

I'm concerned about needless changes to that logic, since it's
so fiddley and it's *very* easy to trigger hardware races that
can be a real PITA to track down.

 
> Bear in mind that with the change I'm proposing, qh_completions() would 
> no longer need to save qh->qh_state in a local variable or restore it 
> later.  Any changes to the QH's state made while the routine was giving 
> back an URB would remain visible.

I don't follow what you're saying.  No matter what, there needs
to be some kind of "please re-scan this QH" flag, and a way to
prevent things like modifying a QH that just *looks* empty but
isn't quite done with qh_completions() processing.


> > > ehci_urb_dequeue() will call qh_completions() when an interrupt URB is 
> > > unlinked.  While that URB is being given back, an IRQ might occur and 
> > > scan_periodic() might then call qh_completions() for the same endpoint.
> > 
> > If so, bug.  But how would that IRQ processing get applied to a
> > QH that's unlinked from the schedule used by the IRQ processing?
> 
> You're right, it wouldn't.  Still, a little defensive programming won't
> hurt; there are a lot of possible paths for calling qh_completions.

Sure, have it verify on entry that it's not re-entering (different
CPU or whatever).


> And I'm going to add one: dequeuing an URB on a QH that has to be IDLE
> because it's waiting for a Clear-TT-Buffer to finish.
> 
> > What's the scenario for endpoint reset goofing the toggle?
> 
> An URB's completion routine calls usb_reset_endpoint().  When 
> ehci_endpoint_reset() runs, it will see that qh->qh_state is 
> QH_STATE_COMPLETING rather than QH_STATE_LINKED and so it won't 
> initiate an unlink.
> 
> As it turns out this doesn't matter, since the QH will be unlinked
> anyway once the giveback returns (its TD queue will be empty).

So there *is* no such scenario today; good.  


> Still, it's not good to rely on such side effects.

Yes and no.  I should have put "side effects" in quotes;
it's actually by design that state transitions always
force certain actions.  Which means they're not really
what are usually labeled as side effects.

 
> > > So I think it would be better to do away with 
> > > QH_STATE_COMPLETING entirely.
> > 
> > How about just starting by fixing the control/bulk case, without
> > touching that particular bit of logic?  The mechanism to force
> > such a scan is needed regardless, and is orthogonal to the rest.
> 
> We need to force either an unlink or a rescan, depending on whether or 
> not the QH is currently linked.  There's no way to tell which if the 
> state is COMPLETING.

Just have unlink_async() set a flag.  Maybe have qh_completions()
handle the (b) case if stopped; for sure, have scan_async() do the
honors for (a).  Clear the flag after handling it.

- Dave

> > Reworking the periodic schedule scan/link/unlink support is a
> > whole nother ball of worms.  One that doesn't neccessarily need
> > to involve changes to the QH (or xITD) state machines.
> 
> Yes, the changes to the periodic URB handling should be done 
> separately.
> 
> Alan Stern
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux