On Mon, Sep 24, 2012 at 12:59:51PM -0400, Alan Stern wrote: > > If you want to track down what's going wrong, you'll have to add some > > debugging code to usb_device_read() and usb_remove_hcd(). By the time > > usb_device_dump() starts running, it's already too late. > > After thinking about this some more, I realized that my patch still > leaves a race -- although the oops would occur in a different place > (where usb_device_read checks bus->root_hub->devnum). > > Here's a different patch which should work better. It relies on the > rh_registered flag in the usb_hcd structure, which persists as long as > the usb_bus structure does, rather than on anything stored in the > root-hub device structure. Hi Alan, This patch seemed to be successful. I copied and pasted the response from our customer (we backported the patch to RHEL-6/2.6.32): " This new patch testing went very well. In previous tests, the kernel paniced during the first or second surprise removal of the ehci_hcd PCI device on an idle system. I ran an entire night of surprise and polite device removals with no kernel panic or Oops. slub_debug=FZPU was used to poison deallocated storage blocks and check for use after free. No BUGs were logged by the allocator. Numerous messages like the following were seen at the console, indicating that the stimulus leading to the panic should be occurring: cat: /proc/bus/usb/001/006: No such device Surprise removals occur when the device is electrically disconnected from the PCI bus while in use. During a following clean-up operation, the driver's remove function called. Polite removals occur when the driver's remove function is called while the device is in use. After return from the driver, the device is electrically disconnected from the PCI bus. I ran 5 hours of surprise and polite removals on the same idle system that was used for the previous tests, about 60 trials of each type of removal. Then a workload was started to cause mid-level CPU, Memory and Disk stress; the test continued to run for 8.5 hours more executing about 38 more trials of each type of PCI removal. " Is there anything else you need from us or can we move forward with this patch? Thanks again! Cheers, Don > > > > Index: usb-3.6/drivers/usb/core/devices.c > =================================================================== > --- usb-3.6.orig/drivers/usb/core/devices.c > +++ usb-3.6/drivers/usb/core/devices.c > @@ -624,7 +624,7 @@ static ssize_t usb_device_read(struct fi > /* print devices for all busses */ > list_for_each_entry(bus, &usb_bus_list, bus_list) { > /* recurse through all children of the root hub */ > - if (!bus->root_hub) > + if (!bus_to_hcd(bus)->rh_registered) > continue; > usb_lock_device(bus->root_hub); > ret = usb_device_dump(&buf, &nbytes, &skip_bytes, ppos, > Index: usb-3.6/drivers/usb/core/hcd.c > =================================================================== > --- usb-3.6.orig/drivers/usb/core/hcd.c > +++ usb-3.6/drivers/usb/core/hcd.c > @@ -1011,10 +1011,7 @@ static int register_root_hub(struct usb_ > if (retval) { > dev_err (parent_dev, "can't register root hub for %s, %d\n", > dev_name(&usb_dev->dev), retval); > - } > - mutex_unlock(&usb_bus_list_lock); > - > - if (retval == 0) { > + } else { > spin_lock_irq (&hcd_root_hub_lock); > hcd->rh_registered = 1; > spin_unlock_irq (&hcd_root_hub_lock); > @@ -1023,6 +1020,7 @@ static int register_root_hub(struct usb_ > if (HCD_DEAD(hcd)) > usb_hc_died (hcd); /* This time clean up */ > } > + mutex_unlock(&usb_bus_list_lock); > > return retval; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html