Hi, Roger Quadros <rogerq@xxxxxx> writes: <snip> >> I don't have this text anywhere so I don't know. Is this something TI >> came up with or Synopsys ? Unless I can see a document (preferrably from >> Synopsys) stating this, I can't really accept $subject. > > OK. I'll try to find out if there is an official document about this. > >> >> Another question is: if all it takes is an extra SoftReset, why don't we >> just reset it during probe() if max_speed < SUPER and we're running on >> rev < 2.20a ? BTW, which revision of the IP is on AM57x/DRA7x ? > > The issue might happen on any Run/Stop transition so not sure if doing it > SoftReset just at probe fixes anything. > > On DRA7x it is rev 2.02a. oh, same block as OMAP5 ES2.0 :-( >>>> question is, then: How are you sure that resetting the device actually >>>> solves the issue ? Did you really hit the metastability problem and >>>> noted that it works after a soft-reset ? How did you verify that >>> >>> I don't know if it solves the issue or not. It was suggested by >>> Synopsis to TI's silicon team. >> >> now that's a bummer ;-) >> >>> I never hit the metastability problem detection condition in my >>> overnight tests (i.e. LTDB_LINK_STATE != 4). >> >> overnight is not enough. You need to keep this running for weeks. > > how many weeks is acceptable for you? I can run for that long, no problem. > And what if the issue doesn't happen in that time frame, would you still > consider this case? Well, there's always the possibility we have never triggered the issue to start with :-) What happens if you remove the the current workaround, set maximum-speed to high-speed and constantly toggle run/stop bit (there's a soft-connect file under the UDC's directory in sysfs). Can you ever cause the problems ? >>>> Run/Stop was in a metastable state, considering that Run/Stop signal is >>>> not visible outside the die ? >>> >>> LTDB_LINK_STATE != 4 within 100ms or RUNSTOP set is the condition to >>> detect that the issue occurred. >> >> this doesn't prove anything. This just means that your 100ms timer >> expired. Unless you can verify that Run/Stop is in metastability, you >> cannot be sure this workaround works. >> >> Did anybody run silicon simulation to verify this case ? It's really the >> only way to be sure. > > AFAIK this wasn't reproducible during silicon simulation either. now this is a big problem. We just don't know if $subject is really avoiding the problem ;-) Unless we can trigger the problem, we can't be sure. We are, however, sure that current workaround avoids the problem completely. >>>> It seems to me that resetting the IP is just as "dangerous" as setting >>>> the IP to High-speed in the first place. No ? >>> >>> The soft-reset is just a recovery mechanism if that error is ever hit. >> >> but you don't know if that's a *proper* recovery mechanism because you >> never even *hit* the error. >> >>> Putting the controller in reset state means it is in a known >>> state. Why do you say it would be dangerous? >> >> Because you can't predict the systems' behavior. If the flip-flop didn't >> have time to settle into 0 or 1 state, you don't know what the >> combinatorial part of the IP will do with that bogus value. It's truly >> unpredictable. You also cannot know, for sure, that a SoftReset will be >> enough to bring that flip-flop out of metastability. > > I'm not an expert in this area and can only follow the advice the > Silicon team gives. fair enough. But you must understand we can't just accept anything even if we never trigger we problems. Unless we're certain about the fix, without a shadow of a doubt, we might be creating a very, very hard to debug regression which might end up with sales drop and what not. It's the kinda thing that we all must be concerned about ;-) >>> The original workaround i.e. setting the High-speed instance to >>> Super-speed to avoid this errata is causing another side >>> effect. i.e. erratic interrupts happen and more than 2 seconds delay >> >> this should have been an expected side-effect when you design a >> SuperSpeed controller without a SuperSpeed PHY and don't properly >> terminate inputs. What you have is a floating PIPE3 interface not >> properly terminated and capturing random noise (basically acting as a >> very poor antenna inside your die). Of course the IP will go bonkers and >> give you "erratic error" interrupts. It has no idea what the hell this >> "PHY" on the PIPE3 interface is doing. > > We know that. The damage is already done. :) right, and I'm trying to avoid further damage caused by a fix that hasn't been properly validated :) >>> to enumerations. This problem is more serious and so we have to keep >>> the controller in High-speed and tackle the meta-stability condition >>> if it happens. >> >> you have to tackle the meta-stability, sure, but we need guarantee that >> $subject IS indeed tackling that problem. Unless there's proof that this >> really solves the meta-stability issue, I won't take $subject. Sorry >> dude, but I don't want regressions elsewhere because of a badly >> validated patch. > > I understand. I will see if someone from TI can provide me official > documentation about the workaround. thank you -- balbi
Attachment:
signature.asc
Description: PGP signature