Re: tuning EHCI_TUNE_CERR

Haribabu Narayanan <hari.maillist@xxxxxxxxx> · Wed, 20 Feb 2013 14:27:01 -0800

Thanks Alan.  Replies inline.

On Wed, Feb 20, 2013 at 7:30 AM, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, 20 Feb 2013, Haribabu Narayanan wrote:
>
>> Hi all,
>>
>>    We are facing an issue in one of our platforms which seems to be
>> indirectly related to how close USB transactions are attempted (in
>> case of failed transactions).  In the EHCI layer, we have these two
>> defines that deal with retries for failed USB transactions:
>>
>> #define EHCI_TUNE_CERR          3
>> #define QH_XACTERR_MAX          32
>>
>>    Would appreciate any help with the following questions:
>>
>> - Are EHCI_TUNE_CERR and QH_XACTERR_MAX applicable to exactly the same
>> set of bus-level errors (namely the single error: XactErr) ?  Or in
>> other words, are only the errors that are retried in EHCI-software by
>> using QH_XACTERR_MAX benefiting by the use of EHCI_TUNE_CERR set to a
>> value > 1?
>
> Yes, I believe so.  It would be necessary to read through the EHCI spec
> very closely to make sure, however.  You'd have to check that in every
> situation where the error counter decrements to 0, the transaction
> fails with XactErr status.

Probably difficult to check this as it is not possible to simulate all
kinds of bus errors.

But yes, EHCI states that CERR is decremented only for Transaction
Errors (3.5.3 qTD Token) (Babble and Buffer error are excluded from
"Transaction Errors" here ).  I think for all other errors for which
CERR is not decremented, having CERR<-1 or CERR<-3 does not make any
difference.  Given all this, I was wondering it if is safe to conclude
that if the TD is halted purely because of CERR going from 1->0 it can
be only due to an XactErr which means XactErr is going to be flagged.
Which would directly mean there is nothing lost in terms of overall
retries by making EHCI_TUNE_CERR <- 1.

This involves a lot of inferences though and hence I wasn't so sure.

>
>> - We are contemplating reducing EHCI_TUNE_CERR  from 3 to 1 (while
>> keeping QH_XACTERR_MAX the same).  This helps us because SW is
>> involved with subsequent retries and there is a finite amount of delay
>> involved there.  What are the effects on the system if we do this ?  I
>> can think of the following few :
>>
>> (a) Possible reduced throughput in case the USB device responds poorly
>
> Yes.
>
>> (b) Possibility of more frequent interrupts to the system during the
>> duration where XactErrs are encountered
>
> This will be a very small effect.  Most likely all three retries would
> occur during the same microframe anyway, and interrupts occur only at
> microframe boundaries.

Good point.  But looking through the code and spec., it looks like the
default is 8uFrames (or 1 mS) and the code doesn't seem to be touching
it?

>
>> (c) Possible impact on the robustness (of dealing with badly behaving
>> devices or bad bus conditions) as the effective number of retries are
>> now reduced from 32*3 to 32*1.  We can surely counter this by
>> increasing QH_XACTERR_MAX to (32*3)
>
> Again unlikely to have much effect.  If the device or bus is that badly
> behaved or that noisy, you probably shouldn't be using it at all.
>
>>   are there any more that we are missing?
>
> I can't think of any.  Have you tried making this change?  Does it
> really help at all?

Yes.   We seem to have run into a silicon bug where very frequent
retries gives issues.  I have verified that it helps by setting
CERR<-1 ., however, wasn't sure how sound that approach is.

>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html