Re: [PATCH 1/2] usb: dwc3: core: Introduce dwc3_device_reinit()

Felipe Balbi <balbi@xxxxxxxxxx> · Mon, 11 Apr 2016 15:51:38 +0300

Hi,

Roger Quadros <rogerq@xxxxxx> writes:

<snip>

>> I don't have this text anywhere so I don't know. Is this something TI
>> came up with or Synopsys ? Unless I can see a document (preferrably from
>> Synopsys) stating this, I can't really accept $subject.
>
> OK. I'll try to find out if there is an official document about this.
>
>> 
>> Another question is: if all it takes is an extra SoftReset, why don't we
>> just reset it during probe() if max_speed < SUPER and we're running on
>> rev < 2.20a ? BTW, which revision of the IP is on AM57x/DRA7x ?
>
> The issue might happen on any Run/Stop transition so not sure if doing it
> SoftReset just at probe fixes anything.
>
> On DRA7x it is rev 2.02a.

oh, same block as OMAP5 ES2.0 :-(

>>>> question is, then: How are you sure that resetting the device actually
>>>> solves the issue ? Did you really hit the metastability problem and
>>>> noted that it works after a soft-reset ? How did you verify that
>>>
>>> I don't know if it solves the issue or not. It was suggested by
>>> Synopsis to TI's silicon team.
>> 
>> now that's a bummer ;-)
>> 
>>> I never hit the metastability problem detection condition in my
>>> overnight tests (i.e. LTDB_LINK_STATE != 4).
>> 
>> overnight is not enough. You need to keep this running for weeks.
>
> how many weeks is acceptable for you? I can run for that long, no problem.
> And what if the issue doesn't happen in that time frame, would you still
> consider this case?

Well, there's always the possibility we have never triggered the issue
to start with :-) What happens if you remove the the current workaround,
set maximum-speed to high-speed and constantly toggle run/stop bit
(there's a soft-connect file under the UDC's directory in sysfs). Can
you ever cause the problems ?

>>>> Run/Stop was in a metastable state, considering that Run/Stop signal is
>>>> not visible outside the die ?
>>>
>>> LTDB_LINK_STATE != 4 within 100ms or RUNSTOP set is the condition to
>>> detect that the issue occurred.
>> 
>> this doesn't prove anything. This just means that your 100ms timer
>> expired. Unless you can verify that Run/Stop is in metastability, you
>> cannot be sure this workaround works.
>> 
>> Did anybody run silicon simulation to verify this case ? It's really the
>> only way to be sure.
>
> AFAIK this wasn't reproducible during silicon simulation either.

now this is a big problem. We just don't know if $subject is really
avoiding the problem ;-) Unless we can trigger the problem, we can't be
sure. We are, however, sure that current workaround avoids the problem
completely.

>>>> It seems to me that resetting the IP is just as "dangerous" as setting
>>>> the IP to High-speed in the first place. No ?
>>>
>>> The soft-reset is just a recovery mechanism if that error is ever hit.
>> 
>> but you don't know if that's a *proper* recovery mechanism because you
>> never even *hit* the error.
>> 
>>> Putting the controller in reset state means it is in a known
>>> state. Why do you say it would be dangerous?
>> 
>> Because you can't predict the systems' behavior. If the flip-flop didn't
>> have time to settle into 0 or 1 state, you don't know what the
>> combinatorial part of the IP will do with that bogus value. It's truly
>> unpredictable. You also cannot know, for sure, that a SoftReset will be
>> enough to bring that flip-flop out of metastability.
>
> I'm not an expert in this area and can only follow the advice the
> Silicon team gives.

fair enough. But you must understand we can't just accept anything even
if we never trigger we problems. Unless we're certain about the fix,
without a shadow of a doubt, we might be creating a very, very hard to
debug regression which might end up with sales drop and what not. It's
the kinda thing that we all must be concerned about ;-)

>>> The original workaround i.e. setting the High-speed instance to
>>> Super-speed to avoid this errata is causing another side
>>> effect. i.e. erratic interrupts happen and more than 2 seconds delay
>> 
>> this should have been an expected side-effect when you design a
>> SuperSpeed controller without a SuperSpeed PHY and don't properly
>> terminate inputs. What you have is a floating PIPE3 interface not
>> properly terminated and capturing random noise (basically acting as a
>> very poor antenna inside your die). Of course the IP will go bonkers and
>> give you "erratic error" interrupts. It has no idea what the hell this
>> "PHY" on the PIPE3 interface is doing.
>
> We know that. The damage is already done. :)

right, and I'm trying to avoid further damage caused by a fix that
hasn't been properly validated :)

>>> to enumerations. This problem is more serious and so we have to keep
>>> the controller in High-speed and tackle the meta-stability condition
>>> if it happens.
>> 
>> you have to tackle the meta-stability, sure, but we need guarantee that
>> $subject IS indeed tackling that problem. Unless there's proof that this
>> really solves the meta-stability issue, I won't take $subject. Sorry
>> dude, but I don't want regressions elsewhere because of a badly
>> validated patch.
>
> I understand. I will see if someone from TI can provide me official
> documentation about the workaround.

thank you

-- 
balbi
Attachment:
signature.asc

Description: PGP signature