On 03/20/2013 08:40 AM, Alan Stern wrote: > On Tue, 19 Mar 2013, Stephen Warren wrote: > >> On 03/19/2013 04:48 PM, Stephen Warren wrote: >>> On 03/19/2013 02:07 PM, Alan Stern wrote: >> ... >>>> A dmesg log with CONFIG_USB_DEBUG enabled would be helpful. We ought >>>> to be able to tell where khubd is getting stuck. >>> >>> Hmmm. Enabling CONFIG_USB_DEBUG appears to mask the problem. I assume >>> this is some kind of timing/race condition, unless there's some code >>> with required side-effects hiding under ifdef CONFIG_USB_DEBUG somehow? >> >> Some further information: I added some printks, which are hopefully >> obvious from the text below, and in the failing case, I see: >> >>> [ 1.291277] single_unlink_async: qh ee31bc40 qh_state set to UNLINK_WAIT >>> [ 1.297960] start_iaa_cycle: qh ee31bc40 qh_state changing UNLINK_WAIT -> UNLINK >> ... >>> [ 6.452331] ehci_urb_dequeue: qh ee31bc40 attempting dequeue (qh_stated 2) >> >> This is with CONFIG_USB_DEBUG disabled. That seems to happen to the very >> first (and only) URB(?) ever issued. > > There should be a call to ehci_end_async_unlink shortly after > start_iaa_cycle. > >> If I enable CONFIG_USB_DEBUG, then I see the more expected: >> >>> [ 1.540410] single_unlink_async: qh ee0c0300 qh_state set to UNLINK_WAIT >>> [ 1.547094] start_iaa_cycle: qh ee0c0300 qh_state changing UNLINK_WAIT -> UNLINK >>> [ 1.554487] start_iaa_cycle: qh ee0c0300 qh_state was UNLINK; processing >> >> followed by a whole slew of subsequent URBs being submitted and processed. >> >> Perhaps the issue is that start_iaa_cycle() (or whatever triggers it) >> only happens when there's an URB in state UNLINK, but not when there's >> only one in state UNLINK_WAIT, so that it only happens once rather than >> the required twice? I'm not sure why a timing difference would affect >> this though, unless there's some loop in the IAA processing that happens >> to do both the UNLINK_WAIT->UNLINK change, and the processing of the URB >> in the UNLINK state in one invocation sometimes, but not others? > > A possible explanation for a timing difference is that the IAA > mechanism simply isn't working right on your controller, so the IAA > watchdog timer expires. It's worth pointing out that your second log > excerpt above has messages spaced at intervals of 7 ms, which is > strange considering that the second line should get printed almost > immediately after the first. Excessively long delays like this do > point in the direction of a timer expiration. Bugs probably wouldn't surprise me:-) > Anyway, there _is_ a mistake in the "work around silicon bug" patch; I > forgot to update the timeout handler. Does the patch below help? You > should build it without CONFIG_USB_DEBUG; I changed the ehci_vdbg to > ehci_info so that it would show up in the log regardless and we can see > if it gets triggered. Yes, with that applied, I see: > root@localhost:~# dmesg|grep IAA > [ 1.310994] tegra-ehci tegra-ehci.0: IAA watchdog: status a0a0 cmd 10025 > [ 1.322006] tegra-ehci tegra-ehci.0: IAA watchdog: status 80a0 cmd 10025 and reboot and shutdown both work fine. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html