Re: Query - resetting and reenumerating root-hub

Alexander Kurpiers <a.kurpiers@xxxxxxxxxxxxxxxxxx> · Wed, 09 Jun 2010 14:38:45 +0200

On 09.06.2010 13:44, Gadiyar, Anand wrote:
> Alexander Kurpiers wrote:
>   
>> Gadiyar, Anand wrote:
>>     
>>> Alan Stern wrote:
>>>   
>>>       
>>>> On Mon, 7 Jun 2010, Gadiyar, Anand wrote:
>>>>     
>>>>         
>>>>> Hi all,
>>>>>
>>>>> On the OMAP3, we have a new hardware bug that causes the
>>>>> EHCI controller to lock up under heavy stress. The only
>>>>> known way to recover is to soft-reset the controller.
>>>>>
>>>>> I'm trying to implement some kind of recovery mechanism
>>>>> for the ehci-omap driver. Is there a way to inform the
>>>>> USB core that the root-hub and down-stream devices have
>>>>> been reset and need to be re-enumerated?
>>>>>       
>>>>>           
>>>> There's usb_hc_died().  It tells the core that the controller has
>>>> stopped working.  The core then removes all devices below the root hub
>>>> and marks the root hub as non-operational (the state is set to
>>>> USB_STATE_NOTATTACHED).  But there is no re-enumeration; from that
>>>> point on the root hub is unusable.
>>>>
>>>> This may not be exactly what you want.  Perhaps a better match would be
>>>> usb_reset_device(), but that routine specifically excludes root hubs.
>>>> You might be able to adjust it in some way, though.
>>>>
>>>> Another alternative is simply to unregister the hcd and then
>>>> re-register it.
>>>>
>>>>     
>>>>         
>>> (Sorry, hit send before completing the mail)
>>>
>>> Thanks! As a quick test, I built the driver as a module. And
>>> in the remove path, we soft-reset the controller anyway.
>>>
>>>   
>>>       
>> as I was the one who originally reported this problem, I can tell you
>> that soft-reset of EHCI is not enough. I had to reset UHH to recover -
>> sometimes losing EHCI for good, but I guess that was something else in
>> the end.
>>     
> Yes, we need to soft-reset the whole block through the UHH_SYSCONFIG,
> not just EHCI.
>
> I am still not able to make this mechanism work reliably. I've tried
> a "save-functional-registers, soft-reset, restore-registers" approach
> and it used to work a little better.
>
> Another problem is detecting the lockup itself...
>
>   

This was rather easy: the IAA watchdog triggered and looking at the QTD
in question you could see that it was stuck on the setup packet as far
as I remember (I still must have some traces in the office). I never saw
the problem on anything else by EP0 - but in fact that is what the bug
description says anyway.

So what would need to be done is: detect, reset and then re-trigger
device detection and enumeration.
I can give you a test case where you can quite reliably trigger the
problem within 10min to 1h.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html