crash in recursively_mark_NOTATTACHED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello linux-usb,

I've been testing surprise device hotlug removal with RHEL7 on Stratus
hardware (fully redundant PCI branches) and ran into a crashing NULL-ptr
bug during device initialization.  The code looks the same upstream, so
I'm reporting it here.

Trace 1 (khubd)
===============

PID: 93     TASK: ffff880178da6580  CPU: 3   COMMAND: "khubd"
 #0 [ffff880178dbf5d0] __schedule at ffffffff815e7bbd
 #1 [ffff880178dbf638] schedule at ffffffff815e80f9
 #2 [ffff880178dbf648] schedule_timeout at ffffffff815e5fc5
 #3 [ffff880178dbf6f0] wait_for_completion_timeout at ffffffff815e8a96
 #4 [ffff880178dbf750] usb_start_wait_urb at ffffffff813fc14c
 #5 [ffff880178dbf7c8] usb_control_msg at ffffffff813fc31c
 #6 [ffff880178dbf828] usb_get_status at ffffffff813fc5dc
 #7 [ffff880178dbf878] hub_configure at ffffffff813f6a28
 #8 [ffff880178dbf910] hub_probe at ffffffff813f73f2
 #9 [ffff880178dbf958] usb_probe_interface at ffffffff81400684
#10 [ffff880178dbf9a0] driver_probe_device at ffffffff813b7077
#11 [ffff880178dbf9c8] __device_attach at ffffffff813b73bb
#12 [ffff880178dbf9e8] bus_for_each_drv at ffffffff813b4edb
#13 [ffff880178dbfa28] device_attach at ffffffff813b6f78
#14 [ffff880178dbfa50] bus_probe_device at ffffffff813b6278
#15 [ffff880178dbfa78] device_add at ffffffff813b3ce4
#16 [ffff880178dbfae0] usb_set_configuration at ffffffff813fe6eb
#17 [ffff880178dbfb88] generic_probe at ffffffff81408b4e
#18 [ffff880178dbfba8] usb_probe_device at ffffffff81400472
#19 [ffff880178dbfbc8] driver_probe_device at ffffffff813b7077
#20 [ffff880178dbfbf0] __device_attach at ffffffff813b73bb
#21 [ffff880178dbfc10] bus_for_each_drv at ffffffff813b4edb
#22 [ffff880178dbfc50] device_attach at ffffffff813b6f78
#23 [ffff880178dbfc78] bus_probe_device at ffffffff813b6278
#24 [ffff880178dbfca0] device_add at ffffffff813b3ce4
#25 [ffff880178dbfd08] usb_new_device at ffffffff813f3d00
#26 [ffff880178dbfd48] hub_port_connect_change at ffffffff813f5250
#27 [ffff880178dbfde0] hub_events at ffffffff813f5b29
#28 [ffff880178dbfe70] hub_thread at ffffffff813f6225
#29 [ffff880178dbfec8] kthread at ffffffff81085aff
#30 [ffff880178dbff50] ret_from_fork at ffffffff815f2fac


Trace 2 (ftworker io_poll, Stratus hotplug driver)
==================================================

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff813f0e3e>] recursively_mark_NOTATTACHED+0x3e/0x80
PGD 0 
Oops: 0000 [#1] SMP 
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables team_mode_activebackup team ccmod(POF) sg ftmod(OF) coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ipmi_devintf dm_mirror dm_region_hash dm_log mperf uinput nfsd auth_rpcgss nfs_acl lockd dm_multipath sunrpc dm_mod ext4 mbcache jbd2 raid1 sr_mod cdrom mpt2sas(OF) raid_class igb(OF) sd_mod(OF) ptp crc_t10dif pps_core crct10dif_common matroxfb(OF) dca i2c_algo_bit scsi_hbas(OF) i2c_core scsi_transport_sas videosw(OF) ipmi_msghandler usb_storage
CPU: 0 PID: 12393 Comm: kworker/u64:5 Tainted: PF          O--------------   3.10.0-123.6.3.el7.bigphysarea.x86_64 #1
Hardware name: Stratus ftServer 2700/G7LAY, BIOS BIOS Version 6.3:57 12/25/2013
Workqueue: ftworker io_poll [ftmod]
task: ffff88017900e580 ti: ffff880153722000 task.ti: ffff880153722000
RIP: 0010:[<ffffffff813f0e3e>]  [<ffffffff813f0e3e>] recursively_mark_NOTATTACHED+0x3e/0x80
RSP: 0018:ffff880153723818  EFLAGS: 00010002
RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880163aa4a20
RBP: ffff880153723830 R08: 0000000000000092 R09: 0000000000006f6a
R10: 0000000000000000 R11: ffff880153723476 R12: ffff88014e3dae68
R13: ffff880173f5d480 R14: ffff88006cb61df8 R15: 0000000000000048
FS:  0000000000000000(0000) GS:ffff88017b200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000730dd000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffff88006cb65cd0 ffff88014cd76458 ffff880153723858
 ffffffff813f0e4b ffff88006cb65cd0 0000000000000000 ffff88006cb61ee4
 ffff880153723888 ffffffff813f0f2c 0000000000000092 ffff88006cb61bd8
Call Trace:
 [<ffffffff813f0e4b>] recursively_mark_NOTATTACHED+0x4b/0x80
 [<ffffffff813f0f2c>] usb_set_device_state+0xac/0x160
 [<ffffffff813f7c03>] usb_hc_died+0xc3/0xe0
 [<ffffffff81414fe8>] ehci_irq+0x228/0x530
 [<ffffffff81194ae2>] ? kmem_cache_free+0x1b2/0x1d0
 [<ffffffff81226a58>] ? release_sysfs_dirent+0x78/0x120
 [<ffffffff813f86a8>] usb_hcd_irq+0x38/0x60
 [<ffffffff81409fad>] usb_hcd_pci_remove+0x4d/0x150
 [<ffffffff812ec5bb>] pci_device_remove+0x3b/0xb0
 [<ffffffff813b6b9f>] __device_release_driver+0x7f/0xf0
 [<ffffffff813b6c33>] device_release_driver+0x23/0x30
 [<ffffffff813b63a8>] bus_remove_device+0x108/0x180
 [<ffffffff813b2955>] device_del+0x135/0x1d0
 [<ffffffff812e54b4>] pci_stop_bus_device+0x94/0xa0
 [<ffffffff812e545b>] pci_stop_bus_device+0x3b/0xa0
 [<ffffffff812e545b>] pci_stop_bus_device+0x3b/0xa0
 [<ffffffff812e55a2>] pci_stop_and_remove_bus_device+0x12/0x20
 [<ffffffffa031e917>] ioboard_bringdown+0xa7/0xe0 [ftmod]
 [<ffffffffa031ec07>] ftmod_ioboard_event+0xd7/0x400 [ftmod]
 [<ffffffffa0342a00>] ? OS_READ_PORT_UINT16+0x20/0x20 [ccmod]
 [<ffffffffa031c7b2>] io_state_change+0x92/0x130 [ftmod]
 [<ffffffffa03d8f34>] OsIoStateChange+0x64/0x80 [ccmod]
 [<ffffffffa03d9968>] IoStateChange+0x7e8/0x890 [ccmod]
 [<ffffffffa03dd614>] CcIoPoll+0x994/0xa70 [ccmod]
 [<ffffffff8101a6a9>] ? sched_clock+0x9/0x10
 [<ffffffff8109a045>] ? sched_clock_cpu+0xb5/0x100
 [<ffffffff810115d6>] ? __switch_to+0x136/0x490
 [<ffffffffa0317b76>] CcIoPoll+0x66/0xb0 [ftmod]
 [<ffffffffa031c6d5>] io_poll+0x25/0x70 [ftmod]
 [<ffffffff8107e03b>] process_one_work+0x17b/0x460
 [<ffffffff8107ee0b>] worker_thread+0x11b/0x400
 [<ffffffff8107ecf0>] ? rescuer_thread+0x400/0x400
 [<ffffffff81085aff>] kthread+0xcf/0xe0
 [<ffffffff81085a30>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff815f2fac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81085a30>] ? kthread_create_on_node+0x140/0x140
Code: e8 f8 fe ff ff 49 89 c5 41 8b 84 24 98 04 00 00 85 c0 7e 31 66 0f 1f 84 00 00 00 00 00 49 8b 95 10 02 00 00 48 63 cb 48 8b 14 ca <48> 8b 3a 48 85 ff 74 05 e8 b5 ff ff ff 83 c3 01 41 39 9c 24 98 
RIP  [<ffffffff813f0e3e>] recursively_mark_NOTATTACHED+0x3e/0x80
 RSP <ffff880153723818>
CR2: 000000000000000

crash> dis recursively_mark_NOTATTACHED
  <recursively_mark_NOTATTACHED+0xd>:  mov    %rdi,%r12
...
  <recursively_mark_NOTATTACHED+0x30>: mov    0x210(%r13),%rdx
  <recursively_mark_NOTATTACHED+0x37>: movslq %ebx,%rcx
  <recursively_mark_NOTATTACHED+0x3a>: mov    (%rdx,%rcx,8),%rdx
  <recursively_mark_NOTATTACHED+0x3e>: mov    (%rdx),%rdi            <<
  <recursively_mark_NOTATTACHED+0x41>: test   %rdi,%rdi
  <recursively_mark_NOTATTACHED+0x44>: je     0xffffffff813f0e4b <recursively_mark_NOTATTACHED+0x4b>

drivers/usb/core/hub.c:

1866 static void recursively_mark_NOTATTACHED(struct usb_device *udev)
1867 {
1868         struct usb_hub *hub = usb_hub_to_struct_hub(udev);
1869         int i;
1870 
1871         for (i = 0; i < udev->maxchild; ++i) {
1872                 if (hub->ports[i]->child)                       <<
1873                         recursively_mark_NOTATTACHED(hub->ports[i]->child);
1874         }
1875         if (udev->state == USB_STATE_SUSPENDED)
1876                 udev->active_duration -= jiffies;
1877         udev->state = USB_STATE_NOTATTACHED;
1878 }

Between the disassembly and the source code, we know that:

R12 = struct usb_device ffff88014e3dae68
R13 = struct usb_hub ffff880173f5d480

crash> struct usb_hub ffff880173f5d480 | grep ports
  ports = 0xffff88005b8e0ab8

The ports[] pointer array is still allocated:

crash> kmem 0xffff88005b8e0ab8
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff88017ac07840 kmalloc-64                64      16964     17302    422    16k
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea00016e3800  ffff88005b8e0000     0     41         24    17
  FREE / [ALLOCATED]
  [ffff88005b8e0ab8]

      PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffea00016e3800 5b8e0000                0 ffff88005b8e3bb8  1 1fffff00004080 slab,head

but it contains eight NULL entries:

crash> struct usb_device ffff88014e3dae68 | grep maxchild
  maxchild = 0x8,

crash> rd 0xffff88005b8e0ab8 8
ffff88005b8e0ab8:  0000000000000000 0000000000000000   ................
ffff88005b8e0ac8:  0000000000000000 0000000000000000   ................
ffff88005b8e0ad8:  0000000000000000 0000000000000000   ................
ffff88005b8e0ae8:  0000000000000000 0000000000000000   ................

Since the kernel was booted with slub debug poisoning (slub_debug=FZPU),
these are probably uninitialized or reset pointers.


Possible order of events and commentary:

          *** USB host controller inserted ***

[ khubd ]                          [ hotplug driver ]
...
  hub_probe
    hub_configure

      /* setup maxchild */

      hdev->maxchild =
        hub->descriptor->bNbrPorts;

      usb_get_status
        ...
          usb_start_wait_urb
            zzzzzz...

         *** USB host controller is removed ***

[ khubd ]                          [ hotplug driver ]
                                   
                                   pci_stop_bus_device
                                     ...
                                       usb_hcd_pci_remove
                                         usb_hcd_irq
                                           ehci_irq
                                             usb_hc_died
                                               usb_set_device_state
                                                 recursively_mark_NOTATTACHED

                                                   /* iterate over all maxchild */
                                                   /* assume ports[x] are allocated */

                                                   for (i = 0; i < udev->maxchild; ++i) {
                                                           if (hub->ports[i]->child)

          *** crash on hub->ports[i]-> dereference ***

                                                                   recursively_mark_NOTATTACHED(hub->ports[i]->child);
                                                   }

 
[ khubd ]

      /* port_dev would have been
         allocated here */ 

      usb_hub_create_port_device
        port_dev = kzalloc(...)
        hub->ports[port1 - 1] = port_dev


In summary, khubd has initialized the usb_device maxchild to 8 and
provided backing-store for the usb_hub ports[] array.  However, before
it gets to fill in pointers for each port[] entry, the device is removed
and the hotplug driver issues a pci_stop_bus_device.  This removal
percolates all the way down to ehci_irq and usb_hc_died.  When that goes
to mark the device as not attached, it expects that the ports[] pointers
are valid... kaboom.

I've been testing the following patch with the RHEL7 kernel with no
crashes.  I'm not sure if it completely solves the issue at hand, or
simply covers up the crash.  Comments welcome.

Regards,

-- Joe

-->8-- -->8--

>From abb49e02e2f56ed1528198dfe242a9dd3041dc79 Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@xxxxxxxxxxx>
Date: Fri, 5 Sep 2014 15:02:29 -0400
Subject: [PATCH] usb: hub: protect recursively_mark_NOTATTACHED and
 half-initialized device

The recursively_mark_NOTATTACHED assumes that hub->ports[] contains valid
pointers. If a device was removed after udev->maxchild was set, but before
hub->ports[] was populated, recursively_mark_NOTATTACHED may try to reach
through an uninitialized pointer. Since the backing memory was kzalloc'd,
verify that the pointer is not-NULL before dereferencing.

Signed-off-by: Joe Lawrence <joe.lawrence@xxxxxxxxxxx>
---
 drivers/usb/core/hub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 46f5161..fe74c9b 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1935,7 +1935,7 @@ static void recursively_mark_NOTATTACHED(struct usb_device *udev)
 	int i;
 
 	for (i = 0; i < udev->maxchild; ++i) {
-		if (hub->ports[i]->child)
+		if (hub->ports[i] && hub->ports[i]->child)
 			recursively_mark_NOTATTACHED(hub->ports[i]->child);
 	}
 	if (udev->state == USB_STATE_SUSPENDED)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux