On Tuesday 01 December 2009, Alan Stern wrote: > On Tue, 1 Dec 2009, Ondrej Zary wrote: > > On Monday 30 November 2009, Alan Stern wrote: > > > On Mon, 30 Nov 2009, Ondrej Zary wrote: > > > > It does not make much sense to me but I think that it crashes iside > > > > this list manipulation: > > > > > > > > prev = ehci->async; > > > > while (prev->qh_next.qh != qh) > > > > prev = prev->qh_next.qh; > > > > > > Yes, it's crashing in the "while" test because prev is NULL. This > > > means the code is looking for qh in the async list but not finding it. > > > That's supposed to be impossible. > > > > > > The assembly code is peculiar because it includes stuff that isn't in > > > the source code! For example, right at this point (after the end of > > > the loop) there's a test to see whether prev is NULL. Where could that > > > have come from? Do you have any idea? > > > > I'm not sure, I might did something wrong and left it there from my > > previous debugging attempt. > > Okay. But it was part of the reason you had difficulty matching up the > ehci-hcd.s file with the crash code dump. > > > > > prev->hw_next = qh->hw_next; > > > > prev->qh_next = qh->qh_next; > > > > wmb (); > > > > > > These lines aren't reached. > > > > > > Does this happen every time you disconnect the Nexio? > > > > The crash happens almost always when disconnecting the touchscreen. > > When booted without X, it often survives the first disconnect. > > Then it will be easy to recreate the problem. :-) Yes, seems so. But it's not that easy (see below). > > > You can try patching that loop. If prev is NULL then print an error > > > message in the log, including the value of qh and the value of > > > ehci->async, and jump past the following three statements. > > > > > > With that change the system shouldn't crash, although khubd might hang. > > > But we still need to find out how this could have happened. Try > > > collecting a usbmon trace while running the test; then let's compare > > > the usbmon output with the error messages in the log. > > > > gcc version is: gcc (Debian 4.3.4-6) 4.3.4 > > > > Tried something like that before but it did not help at all. > > The check is not triggered and it still oopses. Now it looks like this: > > > > qh->qh_state = QH_STATE_UNLINK; > > ehci->reclaim = qh = qh_get (qh); > > > > prev = ehci->async; > > if (!prev) { > > printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, > > ehci->async); goto after; > > } > > [*] > while (prev->qh_next.qh != qh) { > > > if (!prev) { > > printk("prev is NULL, qh=%p, ehci->async=%p\n", > > qh, ehci->async); goto after; > > } > > prev = prev->qh_next.qh; > > } > > > > prev->hw_next = qh->hw_next; > > prev->qh_next = qh->qh_next; > > wmb (); > > > > after: > > No, that is wrong. The [*] line can still perform an invalid > dereference. You need to move the inside test to the end of the loop, > after the assignment statement. Or else do this: Oops, such a stupid mistake. Fixed now: prev = ehci->async; if (!prev) { printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async); goto after; } while (prev->qh_next.qh != qh) { prev = prev->qh_next.qh; if (!prev) { printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async); goto after; } } prev->hw_next = qh->hw_next; prev->qh_next = qh->qh_next; wmb (); after: It shows "prev is NULL, qh=f6581080, ehci->async=f6581000". The problem is that activating usbmon causes the problem to disappear. No errors in maybe 15 attempts. It appeared on 2nd attempt after unloading usbmon. -- Ondrej Zary -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html