Hi Alan, Changbin, Xenia, I've twice encountered a crash on system reboot in usb_disable_device that looks to be a bos descriptor use-after-free. The machine in question is running a 4.5-rc5 kernel with proprietary and out-of-tree Stratus drivers that facilitate device-removal testing -- these drivers give us the ability to automate/simulate device removal of PCI devices at any time. None of these affect any of the USB drivers or stack directly, they mainly consist of platform-aware features, PCI hotplug calls and a few fixups in storage/network land to better survive the testing. Additional debugging features were enabled: slub_debug=FZPU and the following build options: CONFIG_DEBUG_OBJECTS=y CONFIG_DEBUG_OBJECTS_FREE=y CONFIG_DEBUG_OBJECTS_TIMERS=y CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1 I was able to scrape a dump from the crash, its backtrace looks like: [ 4095.109442] usb 6-1: device descriptor read/8, error -22 [ 4095.240435] usb 6-1: device descriptor read/8, error -22 PID: 18692 TASK: ffff881034229640 CPU: 9 COMMAND: "kworker/9:0" #0 [ffff880a9907f9c8] machine_kexec at ffffffff8105c97b #1 [ffff880a9907fa28] __crash_kexec at ffffffff811097f2 #2 [ffff880a9907faf8] crash_kexec at ffffffff811098df #3 [ffff880a9907fb10] oops_end at ffffffff8101ba89 #4 [ffff880a9907fb38] die at ffffffff8101bfab #5 [ffff880a9907fb68] do_general_protection at ffffffff81019289 #6 [ffff880a9907fb90] general_protection at ffffffff81686948 [exception RIP: usb_disable_ltm+0x4c] RIP: ffffffff8149210c RSP: ffff880a9907fc48 RFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88103756dcd0 RCX: 0000000100300014 RDX: 6b6b6b6b6b6b6b6b RSI: ffffea0040c3ac00 RDI: ffff88103756dcd0 RBP: ffff880a9907fc80 R8: ffff881030eb1110 R9: 0000000000000000 R10: ffff881030eb2df0 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88103617c290 R14: 0000000000000001 R15: 0000000000000012 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff880a9907fc48] usb_disable_device at ffffffff8149f0fa #8 [ffff880a9907fc88] usb_disconnect at ffffffff81494c74 #9 [ffff880a9907fcd8] hub_port_connect at ffffffff81496790 #10 [ffff880a9907fd58] hub_event at ffffffff8149767c #11 [ffff880a9907fe18] process_one_work at ffffffff8109862f #12 [ffff880a9907fe60] worker_thread at ffffffff81098f15 #13 [ffff880a9907fec8] kthread at ffffffff8109ea88 #14 [ffff880a9907ff50] ret_from_fork at ffffffff81684a0f The crash occurred when dereferencing RDX, which is set to the slub_debug poison pattern. The code in question: crash> dis -l usb_disable_ltm+0x4c drivers/usb/core/hub.c: 2976 <usb_disable_ltm+0x4c>: testb $0x2,0x3(%rdx) drivers/usb/core/hub.c :: usb_disable_ltm : 2973 struct usb_hcd *hcd = bus_to_hcd(udev->bus); 2974 2975 /* Check if the roothub and device supports LTM. */ 2976 if (!usb_device_supports_ltm(hcd->self.root_hub) || 2977 !usb_device_supports_ltm(udev)) include/linux/usb.h :: usb_device_supports_ltm : static inline bool usb_device_supports_ltm(struct usb_device *udev) { if (udev->speed != USB_SPEED_SUPER || !udev->bos || !udev->bos->ss_cap) return false; return udev->bos->ss_cap->bmAttributes & USB_LTM_SUPPORT; } Running through a few of the data structure offsets, it appears that we're in usb_device_supports_ltm and referencing off udev->bos->ss_cap: crash> struct -o usb_ss_cap_descriptor | grep bmAttributes [0x3] __u8 bmAttributes; >From the disassembly of usb_device_supports_ltm, RDI looks untouched, so it's probably a struct usb_device. It's kobj name is "6-1", which matches the usb device that had just reported an error in the kernel log. crash> struct usb_device ffff88103756dcd0 | grep name name = 0xffff88102aab8540 "6-1", Now to determine which invocation of usb_device_supports_ltm is bad. Chasing hcd->self.root_hub ... crash> struct usb_device ffff88103756dcd0 | grep bus bus = 0xffff88103617c290, crash> struct -o usb_hcd | grep bus [0x0] struct usb_bus self; crash> struct usb_hcd 0xffff88103617c290 | grep root_hub root_hub = 0xffff881034ad0000, crash> struct usb_device 0xffff881034ad0000 | grep bos bos = 0xffff881012d37a30, crash> struct usb_host_bos 0xffff881012d37a30 | grep ss_cap ss_cap = 0xffff88101af89d95, crash> struct usb_ss_cap_descriptor 0xffff88101af89d95 | grep bmAttributes bmAttributes = 0x2, That looks reasonable. How about udev ... crash> struct usb_device 0xffff88103756dcd0 | grep bos bos = 0xffff8810343a61b0, crash> struct usb_host_bos 0xffff8810343a61b0 | grep ss_cap ss_cap = 0x6b6b6b6b6b6b6b6b, Not so good. Crash utility confirms that the bos pointer is not currently allocated: crash> kmem 0xffff8810343a61b0 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff88103f40f800 kmalloc-64 64 80423 87781 2141 16k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea0040d0e900 ffff8810343a4000 0 41 11 30 FREE / [ALLOCATED] ffff8810343a61b0 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0040d0e980 10343a6000 dead000000000400 0 0 2fffff80000000 This is odd since usb_release_bos_descriptor not only frees bos, but also sets to NULL. Before crash, two errors were logged: usb 6-1: device descriptor read/8, error -22 which can only happen in hub_port_init. One of hub_port_init's callers is usb_reset_and_verify_device, which interestingly enough can reload udev->bos like so: usb_reset_and_verify_device bos = udev->bos [A] for (i = 0; i < SET_CONFIG_TRIES; ++i) ret = hub_port_init "usb 6-1: device descriptor read/8, error -22" [B] if (ret < 0) goto re_enumerate; re_enumerate: usb_release_bos_descriptor(udev) kfree(dev->bos) [C] dev->bos = NULL udev->bos = bos [D] re_enumerate_no_bos: hub_port_logical_disconnect(parent_hub, port1) return -ENODEV A - cache the old udev->bos B - fail to set a new udev->bos C - release udev->bos, set udev->bos to NULL D - set udev->bos to the cached value from A I'm following a hunch here, but I suspect that reboot might have stumbled into an issue inside hub_port_init and then usb_reset_and_verify_device accidentally restored udev->bos to a now freed pointer value. There's a bit of git history around usb_reset_and_verify_device and the bos pointer in particular: e3376d6c87ee usbcore: compare and release one bos descriptor in usb_reset_and_verify_device() Author: Xenia Ragiadakou <burzalodowa@xxxxxxxxx> Suggested-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> Date: Thu Aug 29 21:45:48 2013 +0300 (Fixed a memory leak when the bos descriptor changes. Releases the old udev->bos and uses the new one.) d8f00cd685f5c usb: hub: do not clear BOS field during reset device Author: Du, Changbin <changbin.du@xxxxxxxxx> Date: Mon Jan 18 21:02:42 2016 +0800 (In usb_reset_and_verify_device, removed the clearing of udev->bos to NULL (as introduced in e3376d6c87ee) and modified the function's done: path to check to see if the bos descriptor has changed, and if so, release the old udev->bos and use the new one.) I'm not a USB expert, so I don't know if Changbin's patch can be extended to include the function's error paths as well, or if usb_reset_and_verify_device can be fixed simply by jumping to re_enumerate_no_bos in the case(s) that hub_port_init doesn't allocate a new one. Hopefully the analysis is correct and can help towards a patch. Let me know if any additional information from the crash or additional tracing would be helpful. Thanks, -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html