On Tue, 1 Dec 2009, Ondrej Zary wrote: > On Monday 30 November 2009, Alan Stern wrote: > > On Mon, 30 Nov 2009, Ondrej Zary wrote: > > > It does not make much sense to me but I think that it crashes iside this > > > list manipulation: > > > > > > prev = ehci->async; > > > while (prev->qh_next.qh != qh) > > > prev = prev->qh_next.qh; > > > > Yes, it's crashing in the "while" test because prev is NULL. This > > means the code is looking for qh in the async list but not finding it. > > That's supposed to be impossible. > > > > The assembly code is peculiar because it includes stuff that isn't in > > the source code! For example, right at this point (after the end of > > the loop) there's a test to see whether prev is NULL. Where could that > > have come from? Do you have any idea? > > I'm not sure, I might did something wrong and left it there from my previous > debugging attempt. Okay. But it was part of the reason you had difficulty matching up the ehci-hcd.s file with the crash code dump. > > > prev->hw_next = qh->hw_next; > > > prev->qh_next = qh->qh_next; > > > wmb (); > > > > These lines aren't reached. > > > > Does this happen every time you disconnect the Nexio? > > The crash happens almost always when disconnecting the touchscreen. > When booted without X, it often survives the first disconnect. Then it will be easy to recreate the problem. :-) > > You can try patching that loop. If prev is NULL then print an error > > message in the log, including the value of qh and the value of > > ehci->async, and jump past the following three statements. > > > > With that change the system shouldn't crash, although khubd might hang. > > But we still need to find out how this could have happened. Try > > collecting a usbmon trace while running the test; then let's compare > > the usbmon output with the error messages in the log. > > gcc version is: gcc (Debian 4.3.4-6) 4.3.4 > > Tried something like that before but it did not help at all. > The check is not triggered and it still oopses. Now it looks like this: > > qh->qh_state = QH_STATE_UNLINK; > ehci->reclaim = qh = qh_get (qh); > > prev = ehci->async; > if (!prev) { > printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async); > goto after; > } [*] > while (prev->qh_next.qh != qh) { > if (!prev) { > printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async); > goto after; > } > prev = prev->qh_next.qh; > } > > prev->hw_next = qh->hw_next; > prev->qh_next = qh->qh_next; > wmb (); > > after: No, that is wrong. The [*] line can still perform an invalid dereference. You need to move the inside test to the end of the loop, after the assignment statement. Or else do this: prev = ehci->async; for (;;) { if (!prev) { printk(KERN_ERR "prev is NULL, qh=%p, " "ehci->async=%p\n", qh, ehci->async); goto after; } if (prev->qh_next.qh == qh) break; prev = prev->qh_next.qh; } Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html