On Thu, 11 Sep 2014, Joe Lawrence wrote: > Hi Alan, > > I've got another USB bug to report that manifests during automated > device removal testing on RHEL7. This one hits the BUG() inside > qh_destroy: How reliably can you trigger this bug? > 67 static void qh_destroy(struct ehci_hcd *ehci, struct ehci_qh *qh) > 68 { > 69 /* clean qtds first, and know this is not linked */ > 70 if (!list_empty (&qh->qtd_list) || qh->qh_next.ptr) { > 71 ehci_dbg (ehci, "unused qh not empty!\n"); > 72 BUG (); > 73 } > and finally a dump of the ehci_qh in question: > > crash> struct ehci_qh ffff88084b84dc80 > struct ehci_qh { > hw = 0xffff880078d1a000, It would be good to see the contents of the ehci_qh_hw structure. That would tell us what device and endpoint this QH was for. > qh_dma = 0x78d1a000, > qh_next = { > qh = 0xffff88084efe6730, > itd = 0xffff88084efe6730, > sitd = 0xffff88084efe6730, > fstn = 0xffff88084efe6730, > hw_next = 0xffff88084efe6730, > ptr = 0xffff88084efe6730 << !NULL > }, > qtd_list = { << list_empty > next = 0xffff88084b84dc98, > prev = 0xffff88084b84dc98 > }, > intr_node = { > next = 0x0, > prev = 0x0 > }, > dummy = 0xffff880078d22000, > unlink_node = { > next = 0xffff88084b84dcc0, > prev = 0xffff88084b84dcc0 > }, > unlink_cycle = 0x0, > qh_state = 0x1, << QH_STATE_LINKED ... > } > > The qtd_list is empty, contains only one entry, itself. > > crash> struct -o ehci_qh | grep td_list > [0x18] struct list_head qtd_list; > crash> p/x 0xffff88084b84dc80 + 0x18 > $1 = 0xffff88084b84dc98 > > but qh->qh_next.ptr is !NULL, so we hit the BUG. However, it seems that > the memory at qh->qh_next.ptr has been freed: > I'm not too familiar with the USB code stack, so any suggestions on > instrumentation that I can add to aid in debugging would be helpful. > Maybe some tracing in qh_link_async / single_unlink_async / > end_unlink_async /qh_link_periodic can reveal the sequence that is > leaving this dangling qh_next.ptr? The place to look is ehci_endpoint_disable. Did that routine get called for this QH? Did it hit the default case of the big switch statement (with its ehci_err statement)? > Note: This does bear some resemblance to a bug that Stratus hit a few > years ago [1] [2], however enough of the code has changed that I'm not > sure the fix for that one would apply to a modern kernel. What version of the driver are you currently running? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html