Hello linux-usb, I've been testing surprise device hotlug removal with RHEL7 on Stratus hardware (fully redundant PCI branches) and ran into a crashing NULL-ptr bug during device initialization. The code looks the same upstream, so I'm reporting it here. Trace 1 (khubd) =============== PID: 93 TASK: ffff880178da6580 CPU: 3 COMMAND: "khubd" #0 [ffff880178dbf5d0] __schedule at ffffffff815e7bbd #1 [ffff880178dbf638] schedule at ffffffff815e80f9 #2 [ffff880178dbf648] schedule_timeout at ffffffff815e5fc5 #3 [ffff880178dbf6f0] wait_for_completion_timeout at ffffffff815e8a96 #4 [ffff880178dbf750] usb_start_wait_urb at ffffffff813fc14c #5 [ffff880178dbf7c8] usb_control_msg at ffffffff813fc31c #6 [ffff880178dbf828] usb_get_status at ffffffff813fc5dc #7 [ffff880178dbf878] hub_configure at ffffffff813f6a28 #8 [ffff880178dbf910] hub_probe at ffffffff813f73f2 #9 [ffff880178dbf958] usb_probe_interface at ffffffff81400684 #10 [ffff880178dbf9a0] driver_probe_device at ffffffff813b7077 #11 [ffff880178dbf9c8] __device_attach at ffffffff813b73bb #12 [ffff880178dbf9e8] bus_for_each_drv at ffffffff813b4edb #13 [ffff880178dbfa28] device_attach at ffffffff813b6f78 #14 [ffff880178dbfa50] bus_probe_device at ffffffff813b6278 #15 [ffff880178dbfa78] device_add at ffffffff813b3ce4 #16 [ffff880178dbfae0] usb_set_configuration at ffffffff813fe6eb #17 [ffff880178dbfb88] generic_probe at ffffffff81408b4e #18 [ffff880178dbfba8] usb_probe_device at ffffffff81400472 #19 [ffff880178dbfbc8] driver_probe_device at ffffffff813b7077 #20 [ffff880178dbfbf0] __device_attach at ffffffff813b73bb #21 [ffff880178dbfc10] bus_for_each_drv at ffffffff813b4edb #22 [ffff880178dbfc50] device_attach at ffffffff813b6f78 #23 [ffff880178dbfc78] bus_probe_device at ffffffff813b6278 #24 [ffff880178dbfca0] device_add at ffffffff813b3ce4 #25 [ffff880178dbfd08] usb_new_device at ffffffff813f3d00 #26 [ffff880178dbfd48] hub_port_connect_change at ffffffff813f5250 #27 [ffff880178dbfde0] hub_events at ffffffff813f5b29 #28 [ffff880178dbfe70] hub_thread at ffffffff813f6225 #29 [ffff880178dbfec8] kthread at ffffffff81085aff #30 [ffff880178dbff50] ret_from_fork at ffffffff815f2fac Trace 2 (ftworker io_poll, Stratus hotplug driver) ================================================== BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff813f0e3e>] recursively_mark_NOTATTACHED+0x3e/0x80 PGD 0 Oops: 0000 [#1] SMP Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables team_mode_activebackup team ccmod(POF) sg ftmod(OF) coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ipmi_devintf dm_mirror dm_region_hash dm_log mperf uinput nfsd auth_rpcgss nfs_acl lockd dm_multipath sunrpc dm_mod ext4 mbcache jbd2 raid1 sr_mod cdrom mpt2sas(OF) raid_class igb(OF) sd_mod(OF) ptp crc_t10dif pps_core crct10dif_common matroxfb(OF) dca i2c_algo_bit scsi_hbas(OF) i2c_core scsi_transport_sas videosw(OF) ipmi_msghandler usb_storage CPU: 0 PID: 12393 Comm: kworker/u64:5 Tainted: PF O-------------- 3.10.0-123.6.3.el7.bigphysarea.x86_64 #1 Hardware name: Stratus ftServer 2700/G7LAY, BIOS BIOS Version 6.3:57 12/25/2013 Workqueue: ftworker io_poll [ftmod] task: ffff88017900e580 ti: ffff880153722000 task.ti: ffff880153722000 RIP: 0010:[<ffffffff813f0e3e>] [<ffffffff813f0e3e>] recursively_mark_NOTATTACHED+0x3e/0x80 RSP: 0018:ffff880153723818 EFLAGS: 00010002 RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880163aa4a20 RBP: ffff880153723830 R08: 0000000000000092 R09: 0000000000006f6a R10: 0000000000000000 R11: ffff880153723476 R12: ffff88014e3dae68 R13: ffff880173f5d480 R14: ffff88006cb61df8 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff88017b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000730dd000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: 0000000000000000 ffff88006cb65cd0 ffff88014cd76458 ffff880153723858 ffffffff813f0e4b ffff88006cb65cd0 0000000000000000 ffff88006cb61ee4 ffff880153723888 ffffffff813f0f2c 0000000000000092 ffff88006cb61bd8 Call Trace: [<ffffffff813f0e4b>] recursively_mark_NOTATTACHED+0x4b/0x80 [<ffffffff813f0f2c>] usb_set_device_state+0xac/0x160 [<ffffffff813f7c03>] usb_hc_died+0xc3/0xe0 [<ffffffff81414fe8>] ehci_irq+0x228/0x530 [<ffffffff81194ae2>] ? kmem_cache_free+0x1b2/0x1d0 [<ffffffff81226a58>] ? release_sysfs_dirent+0x78/0x120 [<ffffffff813f86a8>] usb_hcd_irq+0x38/0x60 [<ffffffff81409fad>] usb_hcd_pci_remove+0x4d/0x150 [<ffffffff812ec5bb>] pci_device_remove+0x3b/0xb0 [<ffffffff813b6b9f>] __device_release_driver+0x7f/0xf0 [<ffffffff813b6c33>] device_release_driver+0x23/0x30 [<ffffffff813b63a8>] bus_remove_device+0x108/0x180 [<ffffffff813b2955>] device_del+0x135/0x1d0 [<ffffffff812e54b4>] pci_stop_bus_device+0x94/0xa0 [<ffffffff812e545b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e545b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e55a2>] pci_stop_and_remove_bus_device+0x12/0x20 [<ffffffffa031e917>] ioboard_bringdown+0xa7/0xe0 [ftmod] [<ffffffffa031ec07>] ftmod_ioboard_event+0xd7/0x400 [ftmod] [<ffffffffa0342a00>] ? OS_READ_PORT_UINT16+0x20/0x20 [ccmod] [<ffffffffa031c7b2>] io_state_change+0x92/0x130 [ftmod] [<ffffffffa03d8f34>] OsIoStateChange+0x64/0x80 [ccmod] [<ffffffffa03d9968>] IoStateChange+0x7e8/0x890 [ccmod] [<ffffffffa03dd614>] CcIoPoll+0x994/0xa70 [ccmod] [<ffffffff8101a6a9>] ? sched_clock+0x9/0x10 [<ffffffff8109a045>] ? sched_clock_cpu+0xb5/0x100 [<ffffffff810115d6>] ? __switch_to+0x136/0x490 [<ffffffffa0317b76>] CcIoPoll+0x66/0xb0 [ftmod] [<ffffffffa031c6d5>] io_poll+0x25/0x70 [ftmod] [<ffffffff8107e03b>] process_one_work+0x17b/0x460 [<ffffffff8107ee0b>] worker_thread+0x11b/0x400 [<ffffffff8107ecf0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aff>] kthread+0xcf/0xe0 [<ffffffff81085a30>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815f2fac>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a30>] ? kthread_create_on_node+0x140/0x140 Code: e8 f8 fe ff ff 49 89 c5 41 8b 84 24 98 04 00 00 85 c0 7e 31 66 0f 1f 84 00 00 00 00 00 49 8b 95 10 02 00 00 48 63 cb 48 8b 14 ca <48> 8b 3a 48 85 ff 74 05 e8 b5 ff ff ff 83 c3 01 41 39 9c 24 98 RIP [<ffffffff813f0e3e>] recursively_mark_NOTATTACHED+0x3e/0x80 RSP <ffff880153723818> CR2: 000000000000000 crash> dis recursively_mark_NOTATTACHED <recursively_mark_NOTATTACHED+0xd>: mov %rdi,%r12 ... <recursively_mark_NOTATTACHED+0x30>: mov 0x210(%r13),%rdx <recursively_mark_NOTATTACHED+0x37>: movslq %ebx,%rcx <recursively_mark_NOTATTACHED+0x3a>: mov (%rdx,%rcx,8),%rdx <recursively_mark_NOTATTACHED+0x3e>: mov (%rdx),%rdi << <recursively_mark_NOTATTACHED+0x41>: test %rdi,%rdi <recursively_mark_NOTATTACHED+0x44>: je 0xffffffff813f0e4b <recursively_mark_NOTATTACHED+0x4b> drivers/usb/core/hub.c: 1866 static void recursively_mark_NOTATTACHED(struct usb_device *udev) 1867 { 1868 struct usb_hub *hub = usb_hub_to_struct_hub(udev); 1869 int i; 1870 1871 for (i = 0; i < udev->maxchild; ++i) { 1872 if (hub->ports[i]->child) << 1873 recursively_mark_NOTATTACHED(hub->ports[i]->child); 1874 } 1875 if (udev->state == USB_STATE_SUSPENDED) 1876 udev->active_duration -= jiffies; 1877 udev->state = USB_STATE_NOTATTACHED; 1878 } Between the disassembly and the source code, we know that: R12 = struct usb_device ffff88014e3dae68 R13 = struct usb_hub ffff880173f5d480 crash> struct usb_hub ffff880173f5d480 | grep ports ports = 0xffff88005b8e0ab8 The ports[] pointer array is still allocated: crash> kmem 0xffff88005b8e0ab8 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff88017ac07840 kmalloc-64 64 16964 17302 422 16k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffea00016e3800 ffff88005b8e0000 0 41 24 17 FREE / [ALLOCATED] [ffff88005b8e0ab8] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea00016e3800 5b8e0000 0 ffff88005b8e3bb8 1 1fffff00004080 slab,head but it contains eight NULL entries: crash> struct usb_device ffff88014e3dae68 | grep maxchild maxchild = 0x8, crash> rd 0xffff88005b8e0ab8 8 ffff88005b8e0ab8: 0000000000000000 0000000000000000 ................ ffff88005b8e0ac8: 0000000000000000 0000000000000000 ................ ffff88005b8e0ad8: 0000000000000000 0000000000000000 ................ ffff88005b8e0ae8: 0000000000000000 0000000000000000 ................ Since the kernel was booted with slub debug poisoning (slub_debug=FZPU), these are probably uninitialized or reset pointers. Possible order of events and commentary: *** USB host controller inserted *** [ khubd ] [ hotplug driver ] ... hub_probe hub_configure /* setup maxchild */ hdev->maxchild = hub->descriptor->bNbrPorts; usb_get_status ... usb_start_wait_urb zzzzzz... *** USB host controller is removed *** [ khubd ] [ hotplug driver ] pci_stop_bus_device ... usb_hcd_pci_remove usb_hcd_irq ehci_irq usb_hc_died usb_set_device_state recursively_mark_NOTATTACHED /* iterate over all maxchild */ /* assume ports[x] are allocated */ for (i = 0; i < udev->maxchild; ++i) { if (hub->ports[i]->child) *** crash on hub->ports[i]-> dereference *** recursively_mark_NOTATTACHED(hub->ports[i]->child); } [ khubd ] /* port_dev would have been allocated here */ usb_hub_create_port_device port_dev = kzalloc(...) hub->ports[port1 - 1] = port_dev In summary, khubd has initialized the usb_device maxchild to 8 and provided backing-store for the usb_hub ports[] array. However, before it gets to fill in pointers for each port[] entry, the device is removed and the hotplug driver issues a pci_stop_bus_device. This removal percolates all the way down to ehci_irq and usb_hc_died. When that goes to mark the device as not attached, it expects that the ports[] pointers are valid... kaboom. I've been testing the following patch with the RHEL7 kernel with no crashes. I'm not sure if it completely solves the issue at hand, or simply covers up the crash. Comments welcome. Regards, -- Joe -->8-- -->8-- >From abb49e02e2f56ed1528198dfe242a9dd3041dc79 Mon Sep 17 00:00:00 2001 From: Joe Lawrence <joe.lawrence@xxxxxxxxxxx> Date: Fri, 5 Sep 2014 15:02:29 -0400 Subject: [PATCH] usb: hub: protect recursively_mark_NOTATTACHED and half-initialized device The recursively_mark_NOTATTACHED assumes that hub->ports[] contains valid pointers. If a device was removed after udev->maxchild was set, but before hub->ports[] was populated, recursively_mark_NOTATTACHED may try to reach through an uninitialized pointer. Since the backing memory was kzalloc'd, verify that the pointer is not-NULL before dereferencing. Signed-off-by: Joe Lawrence <joe.lawrence@xxxxxxxxxxx> --- drivers/usb/core/hub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 46f5161..fe74c9b 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1935,7 +1935,7 @@ static void recursively_mark_NOTATTACHED(struct usb_device *udev) int i; for (i = 0; i < udev->maxchild; ++i) { - if (hub->ports[i]->child) + if (hub->ports[i] && hub->ports[i]->child) recursively_mark_NOTATTACHED(hub->ports[i]->child); } if (udev->state == USB_STATE_SUSPENDED) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html