On Fri, 18 Feb 2011, Don Zickus wrote: > Hi, > > I am trying to debug a panic for our 2.6.32 based kernel (there isn't any > changes to usb other than usb3 stuff in our version of 2.6.32) You mean it includes all the changes below drivers/usb that are in the vanilla 2.6.37 kernel (except for the USB-3 stuff)? > The panic is attached below and I am having trouble reproducing it, so I > am trying to 'think' this one out. I believe there is a race condition > but I don't know enough about the paths in usb to understand it and was > hoping for some help from the mailing list. You should consider enabling CONFIG_PRINTK_TIME in all your test kernels. The high-precision timestamps can often help with debugging. > The problem is when running a stress test (scrashme) on a powerpc blade, > we believe someone accidentally pushed a button on the blade that 'magical > routes' the side mounted usb cdrom to that blade. A few moments later, we > believe that someone realized their error and pressed the button on the > correct blade, thus disconnecting the cdrom and having it routed to the > other blade. > > As a result the below panic happened. Looking at where the panic happened > and the assembly code, I am reasonably confident the panic happened at: > > drivers/usb/core/hcd.c::usb_hcd_unlink_urb::1459 > > (right before the unlink1 command) > hcd = bus_to_hcd(urb->dev->bus); > > what happens is that urb->dev is NULL and thus the derefence to dev->bus > panics the box. Are you sure that urb->dev is NULL? As opposed to pointing to a memory location that used to be occupied by a device structure and now contains some other data? > The only way I can see that happening is usb_put_dev went to zero and > released the device (which would mean the usb_put_dev a couple lines later > would cause another friendly message). This would not affect urb->dev, which suggests that you're not looking at it the right way. > My first impression is that there > is a race condition somewhere, but I don't know the different paths well > enough to know where. > > Does anyone have any thoughts about this or can help me through this > (especially since I am having trouble reproducing it :-/). I could help, given more information. At this stage, I don't think you know enough about the problem to be able to track it down. Unless you can reproduce the bug, the situation may be hopeless. Sorry... Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html