Re: Logic errors involving QH_STATE_COMPLETING

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 22 Jul 2009, David Brownell wrote:

> > And yet if the completion routine submits or unlinks more URBs, we
> > _will_ have to modify the QH.
> 
> Not always; or usually.  As a rule only the TD list changes.  The QH
> only needs to change when that list is empty ... and such updates are
> racey, which is why they're localized.  (One thing to watch out for
> is that the HC runs from a *cached* copy of the first TD, which it
> writes into the QH overlay.  It's not written back to the TD unless
> that TD completes "normally".)

While a QH might need all sort of complicated manipulations, right
now I'm concerned only about linking and unlinking.

> > The real problem (or at least, part of the problem) is that the QH's 
> > true state is obscured.  If other code sees QH_STATE_COMPLETING then it 
> > can't tell whether the QH is really linked, unlinked, or in between.
> 
> It must not care.  There's only one place -- qh_completions() -- that
> expects to address all the hardware races.

Strictly speaking, that's not true.  There's a whole set of
interrelated routines (not to mention a special-purpose timer!) to
handle the racy business of unlinking async QHs.

> > Right.  Since unlink_async() doesn't know whether the QH is truly
> > linked or not, it doesn't know what to do.
> 
> It knows what it should do -- force the scan of the whole TD list,
> which is a side effect of QH unlink -- but doesn't make that happen.

No -- you're talking about what ehci_urb_dequeue() should do.

If the QH is currently idle then ehci_urb_dequeue() doesn't need to
call unlink_async() in the first place; it only needs to force a rescan
(your (b) case below).  It doesn't even need to do that if the QH is in
the middle of unlinking because, as you say, it will be scanned when
the unlink finishes.

If the QH _is_ currently linked then ehci_urb_dequeue() needs to unlink
it -- forcing a rescan isn't enough (your (a) case).  For this it does
need to call unlink_async(), which does need to initiate the unlink.

> > > What's needed is a way to *force* the qh_completions() logic to either
> > > (a) start a QH unlink, if it would restore to QH_STATE_LINKED and the
> > > QH wasn't halted, else (b) do an extra scan, in case it already looked
> > > at that URB on that scan.
> > 
> > "(b) do an extra scan" sounds good.  These cases I'm concerned about
> > are a little exceptional, so the overhead won't matter.
> 
> It needs to handle both cases.  The (b) case presumes that the QH
> isn't live; but if it's completing, it's usually live.

Why does qh_completions() need to be forced into starting a QH unlink?  
What's wrong with just letting unlink_async() do its job, even if it
gets called while an URB is being given back?

Are you concerned about the possibility of unlinking a halted QH?  
That can happen anyway, since a QH can halt while an unlink is in
progress.  Is there some other unintended interaction to watch out for?

Bear in mind that with the change I'm proposing, qh_completions() would 
no longer need to save qh->qh_state in a local variable or restore it 
later.  Any changes to the QH's state made while the routine was giving 
back an URB would remain visible.


> > ehci_urb_dequeue() will call qh_completions() when an interrupt URB is 
> > unlinked.  While that URB is being given back, an IRQ might occur and 
> > scan_periodic() might then call qh_completions() for the same endpoint.
> 
> If so, bug.  But how would that IRQ processing get applied to a
> QH that's unlinked from the schedule used by the IRQ processing?

You're right, it wouldn't.  Still, a little defensive programming won't
hurt; there are a lot of possible paths for calling qh_completions.  
And I'm going to add one: dequeuing an URB on a QH that has to be IDLE
because it's waiting for a Clear-TT-Buffer to finish.

> What's the scenario for endpoint reset goofing the toggle?

An URB's completion routine calls usb_reset_endpoint().  When 
ehci_endpoint_reset() runs, it will see that qh->qh_state is 
QH_STATE_COMPLETING rather than QH_STATE_LINKED and so it won't 
initiate an unlink.

As it turns out this doesn't matter, since the QH will be unlinked
anyway once the giveback returns (its TD queue will be empty).  Still, 
it's not good to rely on such side effects.


> > So I think it would be better to do away with 
> > QH_STATE_COMPLETING entirely.
> 
> How about just starting by fixing the control/bulk case, without
> touching that particular bit of logic?  The mechanism to force
> such a scan is needed regardless, and is orthogonal to the rest.

We need to force either an unlink or a rescan, depending on whether or 
not the QH is currently linked.  There's no way to tell which if the 
state is COMPLETING.

> Reworking the periodic schedule scan/link/unlink support is a
> whole nother ball of worms.  One that doesn't neccessarily need
> to involve changes to the QH (or xITD) state machines.

Yes, the changes to the periodic URB handling should be done 
separately.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux