Re: debugging oops after disconnecting Nexio USB touchscreen

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 1 Dec 2009, Ondrej Zary wrote:

> On Monday 30 November 2009, Alan Stern wrote:
> > On Mon, 30 Nov 2009, Ondrej Zary wrote:
> > > It does not make much sense to me but I think that it crashes iside this
> > > list manipulation:
> > >
> > >         prev = ehci->async;
> > >         while (prev->qh_next.qh != qh)
> > >                 prev = prev->qh_next.qh;
> >
> > Yes, it's crashing in the "while" test because prev is NULL.  This
> > means the code is looking for qh in the async list but not finding it.
> > That's supposed to be impossible.
> >
> > The assembly code is peculiar because it includes stuff that isn't in
> > the source code!  For example, right at this point (after the end of
> > the loop) there's a test to see whether prev is NULL.  Where could that
> > have come from?  Do you have any idea?
> 
> I'm not sure, I might did something wrong and left it there from my previous
> debugging attempt.

Okay.  But it was part of the reason you had difficulty matching up the 
ehci-hcd.s file with the crash code dump.

> > >         prev->hw_next = qh->hw_next;
> > >         prev->qh_next = qh->qh_next;
> > >         wmb ();
> >
> > These lines aren't reached.
> >
> > Does this happen every time you disconnect the Nexio?
> 
> The crash happens almost always when disconnecting the touchscreen.
> When booted without X, it often survives the first disconnect.

Then it will be easy to recreate the problem.  :-)

> > You can try patching that loop.  If prev is NULL then print an error
> > message in the log, including the value of qh and the value of
> > ehci->async, and jump past the following three statements.
> >
> > With that change the system shouldn't crash, although khubd might hang.
> > But we still need to find out how this could have happened.  Try
> > collecting a usbmon trace while running the test; then let's compare
> > the usbmon output with the error messages in the log.
> 
> gcc version is: gcc (Debian 4.3.4-6) 4.3.4
> 
> Tried something like that before but it did not help at all.
> The check is not triggered and it still oopses. Now it looks like this:
> 
>         qh->qh_state = QH_STATE_UNLINK;
>         ehci->reclaim = qh = qh_get (qh);
> 
>         prev = ehci->async;
>         if (!prev) {
>                 printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async);
>                 goto after;
>         }
[*] >         while (prev->qh_next.qh != qh) {
>                 if (!prev) {
>                         printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async);
>                         goto after;
>                 }
>                 prev = prev->qh_next.qh;
>         }
> 
>         prev->hw_next = qh->hw_next;
>         prev->qh_next = qh->qh_next;
>         wmb ();
> 
> after:

No, that is wrong.  The [*] line can still perform an invalid
dereference.  You need to move the inside test to the end of the loop,
after the assignment statement.  Or else do this:

	prev = ehci->async;
	for (;;) {
		if (!prev) {
			printk(KERN_ERR "prev is NULL, qh=%p, "
					"ehci->async=%p\n", qh, ehci->async);
			goto after;
			}
		if (prev->qh_next.qh == qh)
			break;
		prev = prev->qh_next.qh;
	}

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux