Re: tuning EHCI_TUNE_CERR

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Thu, 21 Feb 2013 10:33:22 -0500 (EST)

On Wed, 20 Feb 2013, Haribabu Narayanan wrote:

> >> - Are EHCI_TUNE_CERR and QH_XACTERR_MAX applicable to exactly the same
> >> set of bus-level errors (namely the single error: XactErr) ?  Or in
> >> other words, are only the errors that are retried in EHCI-software by
> >> using QH_XACTERR_MAX benefiting by the use of EHCI_TUNE_CERR set to a
> >> value > 1?
> >
> > Yes, I believe so.  It would be necessary to read through the EHCI spec
> > very closely to make sure, however.  You'd have to check that in every
> > situation where the error counter decrements to 0, the transaction
> > fails with XactErr status.
> 
> Probably difficult to check this as it is not possible to simulate all
> kinds of bus errors.

Checking the hardware implementation is impossible.  What I meant was 
you could check the spec.

> But yes, EHCI states that CERR is decremented only for Transaction
> Errors (3.5.3 qTD Token) (Babble and Buffer error are excluded from
> "Transaction Errors" here ).  I think for all other errors for which
> CERR is not decremented, having CERR<-1 or CERR<-3 does not make any
> difference.  Given all this, I was wondering it if is safe to conclude
> that if the TD is halted purely because of CERR going from 1->0 it can
> be only due to an XactErr which means XactErr is going to be flagged.
> Which would directly mean there is nothing lost in terms of overall
> retries by making EHCI_TUNE_CERR <- 1.
> 
> This involves a lot of inferences though and hence I wasn't so sure.

You don't have to make any inferences.  Just look through the whole 
spec and make sure that every place where it says CERR is decremented, 
it also says the XactErr status bit gets set.

> >> - We are contemplating reducing EHCI_TUNE_CERR  from 3 to 1 (while
> >> keeping QH_XACTERR_MAX the same).  This helps us because SW is
> >> involved with subsequent retries and there is a finite amount of delay
> >> involved there.  What are the effects on the system if we do this ?  I
> >> can think of the following few :
> >>
> >> (a) Possible reduced throughput in case the USB device responds poorly
> >
> > Yes.
> >
> >> (b) Possibility of more frequent interrupts to the system during the
> >> duration where XactErrs are encountered
> >
> > This will be a very small effect.  Most likely all three retries would
> > occur during the same microframe anyway, and interrupts occur only at
> > microframe boundaries.
> 
> Good point.  But looking through the code and spec., it looks like the
> default is 8uFrames (or 1 mS) and the code doesn't seem to be touching
> it?

Read the stuff related to log2_irq_thresh in ehci-hcd.c.

> >> (c) Possible impact on the robustness (of dealing with badly behaving
> >> devices or bad bus conditions) as the effective number of retries are
> >> now reduced from 32*3 to 32*1.  We can surely counter this by
> >> increasing QH_XACTERR_MAX to (32*3)
> >
> > Again unlikely to have much effect.  If the device or bus is that badly
> > behaved or that noisy, you probably shouldn't be using it at all.
> >
> >>   are there any more that we are missing?
> >
> > I can't think of any.  Have you tried making this change?  Does it
> > really help at all?
> 
> Yes.   We seem to have run into a silicon bug where very frequent
> retries gives issues.  I have verified that it helps by setting
> CERR<-1 ., however, wasn't sure how sound that approach is.

If it works for you, go ahead and use it.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html