modprobe -r hangs XHCI and panics on dwc3-omap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've been tracking down two issues and one of them seems to be a problem
with either usbcore or xhci.

DWC3, when acting as host, instantiates an xhci platform-device and sets
itself as the parent of that. That's all fine and dandy until I try to
modprobe -r dwc3.ko which causes XHCI to hang:

| # lsmod
| Module                  Size  Used by
| xhci_hcd              116180  0 
| dwc3                   46765  0 
| udc_core               10472  1 dwc3
| dwc3_omap               5402  0 
| matrix_keypad           7218  0 
| lis3lv02d_i2c           3718  0 
| lis3lv02d              16439  1 lis3lv02d_i2c
| input_polldev           5315  1 lis3lv02d
| # lsusb
| Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
| Bus 001 Device 005: ID 0b95:7720 ASIX Electronics Corp. AX88772
| Bus 001 Device 004: ID 1a40:0101 Terminus Technology Inc. 4-Port HUB
| Bus 001 Device 003: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC
| Bus 001 Device 002: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
| Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
| # modprobe -r dwc3
| [   53.016798] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [   53.023083] usb usb2: USB disconnect, device number 1
| [   53.082845] xhci-hcd xhci-hcd.0.auto: Host not halted after 16000 microseconds.
| [   53.090732] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
| [   53.112511] xhci-hcd xhci-hcd.0.auto: remove, state 1
| [   53.117883] usb usb1: USB disconnect, device number 1
| [   53.123301] usb 1-1: USB disconnect, device number 2
| [   53.128503] usb 1-1.6: USB disconnect, device number 3
| [   90.539781] INFO: task modprobe:1792 blocked for more than 30 seconds.
| [   90.546607]       Not tainted 3.17.0-rc2-00004-ge0b64425 #800
| [   90.552672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [   90.560855] modprobe        D c06bf5a0     0  1792   1662 0x00000000
| [   90.567541] [<c06bf5a0>] (__schedule) from [<c06bfa94>] (schedule+0x40/0x8c)
| [   90.574925] [<c06bfa94>] (schedule) from [<c06c3e48>] (schedule_timeout+0x154/0x220)
| [   90.583031] [<c06c3e48>] (schedule_timeout) from [<c06c0554>] (wait_for_common+0xdc/0x178)
| [   90.591672] [<c06c0554>] (wait_for_common) from [<c06c0610>] (wait_for_completion+0x20/0x24)
| [   90.600537] [<c06c0610>] (wait_for_completion) from [<bf0569d4>] (xhci_configure_endpoint+0xc8/0x590 [xhci_hcd])
| [   90.611226] [<bf0569d4>] (xhci_configure_endpoint [xhci_hcd]) from [<bf057664>] (xhci_check_bandwidth+0x16c/0x294 [xhci_hcd])
| [   90.623100] [<bf057664>] (xhci_check_bandwidth [xhci_hcd]) from [<c04e5578>] (usb_hcd_alloc_bandwidth+0x1dc/0x320)
| [   90.633938] [<c04e5578>] (usb_hcd_alloc_bandwidth) from [<c04e8160>] (usb_disable_device+0x198/0x1f8)
| [   90.643586] [<c04e8160>] (usb_disable_device) from [<c04df3fc>] (usb_disconnect+0x7c/0x224)
| [   90.652323] [<c04df3fc>] (usb_disconnect) from [<c04df54c>] (usb_disconnect+0x1cc/0x224)
| [   90.660778] 8 locks held by modprobe/1792:
| [   90.665055]  #0:  (&dev->mutex){......}, at: [<c0439c04>] driver_detach+0x54/0xc8
| [   90.672929]  #1:  (&dev->mutex){......}, at: [<c0439c10>] driver_detach+0x60/0xc8
| [   90.680798]  #2:  (&dev->mutex){......}, at: [<c0439524>] device_release_driver+0x28/0x3c
| [   90.689373]  #3:  (usb_bus_list_lock){+.+.+.}, at: [<c04e4e04>] usb_remove_hcd+0xa0/0x1b4
| [   90.697971]  #4:  (&dev->mutex){......}, at: [<c04df3d0>] usb_disconnect+0x50/0x224
| [   90.706022]  #5:  (&dev->mutex){......}, at: [<c04df3d0>] usb_disconnect+0x50/0x224
| [   90.714069]  #6:  (&dev->mutex){......}, at: [<c04df3d0>] usb_disconnect+0x50/0x224
| [   90.722109]  #7:  (hcd->bandwidth_mutex){+.+.+.}, at: [<c04e814c>] usb_disable_device+0x184/0x1f8

This only happens when I have devices attached to the XHCI port on my
platform (AM437x, but I suppose any XHCI would die similarly if you can
destroy the underlying {platform,pci}_device.

If I first remove xhci then remove dwc3, it works fine:

| # lsmod
| Module                  Size  Used by
| xhci_hcd              116180  0 
| dwc3                   46765  0 
| udc_core               10472  1 dwc3
| matrix_keypad           7218  0 
| dwc3_omap               5402  0 
| lis3lv02d_i2c           3718  0 
| lis3lv02d              16439  1 lis3lv02d_i2c
| input_polldev           5315  1 lis3lv02d
| # lsusb
| Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
| Bus 001 Device 005: ID 0b95:7720 ASIX Electronics Corp. AX88772
| Bus 001 Device 004: ID 1a40:0101 Terminus Technology Inc. 4-Port HUB
| Bus 001 Device 003: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC
| Bus 001 Device 002: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
| Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
| # modprobe -r xhci-hcd
| [   38.895745] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [   38.902034] usb usb2: USB disconnect, device number 1
| [   38.933439] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
| [   38.945408] xhci-hcd xhci-hcd.0.auto: remove, state 1
| [   38.950968] usb usb1: USB disconnect, device number 1
| [   38.956280] usb 1-1: USB disconnect, device number 2
| [   38.961563] usb 1-1.6: USB disconnect, device number 3
| [   38.980267] usb 1-1.7: USB disconnect, device number 4
| [   38.985710] usb 1-1.7.4: USB disconnect, device number 5
| [   38.994068] asix 1-1.7.4:1.0 eth1: unregister 'asix' usb-xhci-hcd.0.auto-1.7.4, ASIX AX88772 USB 2.0 Ethernet
| [   39.122913] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
| # modprobe -r dwc3
| # 

It also works fine I don't have anything attached to the XHCI port:

| # lsmod
| Module                  Size  Used by
| xhci_hcd              116180  0 
| dwc3                   46765  0 
| udc_core               10472  1 dwc3
| matrix_keypad           7218  0 
| dwc3_omap               5402  0 
| lis3lv02d_i2c           3718  0 
| lis3lv02d              16439  1 lis3lv02d_i2c
| input_polldev           5315  1 lis3lv02d
| # lsusb
| Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
| Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
| # modprobe -r dwc3
| [   63.910052] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [   63.915429] usb usb2: USB disconnect, device number 1
| [   63.959522] xhci-hcd xhci-hcd.0.auto: Host not halted after 16000 microseconds.
| [   63.967461] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
| [   63.981720] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [   63.987160] usb usb1: USB disconnect, device number 1
| [   64.006709] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered

if you want to know, this is running v3.17-rc2 but I know that at least
v3.14 also exibits the same problem. Any suggestions on how to get this
thing sorted out ? I'm pretty much running out of ideas :-s



The second problem I have is exposed because I reverted commit c5a1fbc
(usb: dwc3: dwc3-omap: Fix the crash on module removal) because that fix
is wrong, it had a side effect of modprobe -r dwc3-omap *NOT* destroying
the platform_device for dwc3.ko which wouldn't cause dwc3.ko to unprobed
and its resources would not be destroyed.

I traced this one down to __release_resource() getting a NULL pointer
dereference when grabbing a pointer to old->parent->child, but I can't
seem to figure out exactly what is wrong there. It doesn't seem, to me,
that old->parent or old->parent->child should ever be NULL... Any ideas?

| # modprobe -r dwc3-omap
| [  539.835401] Unable to handle kernel NULL pointer dereference at virtual address 00000018
| [  539.844043] pgd = eb83c000
| [  539.846893] [00000018] *pgd=00000000
| [  539.850734] Internal error: Oops: 5 [#1] SMP ARM
| [  539.855588] Modules linked in: xhci_hcd matrix_keypad dwc3_omap(-) lis3lv02d_i2c lis3lv02d input_polldev [last unloaded: udc_core]
| [  539.867977] CPU: 0 PID: 1878 Comm: modprobe Not tainted 3.17.0-rc2-00004-ge0b64425 #800
| [  539.876384] task: ed0d4040 ti: ed07c000 task.ti: ed07c000
| [  539.882076] PC is at release_resource+0x24/0x90
| [  539.886847] LR is at lock_acquired+0x280/0x3b8
| [  539.891509] pc : [<c004eba8>]    lr : [<c0091f8c>]    psr: 60000013
| [  539.891509] sp : ed07ddf0  ip : ed07dd80  fp : ed07de04
| [  539.903570] r10: 00000000  r9 : ed07c000  r8 : c000f064
| [  539.909061] r7 : 00000081  r6 : c0577eec  r5 : ed564c00  r4 : eb97da80
| [  539.915900] r3 : 00000000  r2 : 00000000  r1 : 60000013  r0 : c004eba4
| [  539.922740] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
| [  539.930238] Control: 10c5387d  Table: ab83c059  DAC: 00000015
| [  539.936274] Process modprobe (pid: 1878, stack limit = 0xed07c248)
| [  539.942751] Stack: (0xed07ddf0 to 0xed07e000)
| [  539.947324] dde0:                                     00000001 ed564c00 ed07de1c ed07de08
| [  539.955897] de00: c043b670 c004eb90 ed564c00 00000000 ed07de34 ed07de20 c043b6bc c043b600
| [  539.964476] de20: c0c55528 ed564c10 ed07de4c ed07de38 c0577f78 c043b6ac ed564c10 00000000
| [  539.973065] de40: ed07de74 ed07de50 c0435ac4 c0577ef8 ed20eb40 ed487578 ed07de84 ed20f410
| [  539.981649] de60: ed210010 ed210044 ed07de84 ed07de78 c0577ee4 c0435a7c ed07de9c ed07de88
| [  539.990206] de80: bf013310 c0577ed0 ed210010 bf013e6c ed07deac ed07dea0 c043b010 bf0132c4
| [  539.998764] dea0: ed07dec4 ed07deb0 c04394a8 c043aff4 ed210010 bf013e6c ed07dee4 ed07dec8
| [  540.007339] dec0: c0439c74 c0439434 ed0d4040 bf013e6c 00000000 00000800 ed07defc ed07dee8
| [  540.015914] dee0: c04391a4 c0439bbc bf01391c bf013e6c ed07df14 ed07df00 c043a4e4 c0439154
| [  540.024498] df00: bf01391c bf013eb0 ed07df24 ed07df18 c043b7c4 c043a4b8 ed07df34 ed07df28
| [  540.033082] df20: bf013930 c043b7b4 ed07dfa4 ed07df38 c00cab3c bf013928 ed07df54 00000000
| [  540.041650] df40: bf013eb0 00000800 ed07df3c 33637764 616d6f5f 00000070 ed07df84 ed07df68
| [  540.050246] df60: c00906a4 c00904ec b7007220 b7007254 00000000 00000081 ed07df94 ed07df88
| [  540.058818] df80: c00907fc 00090584 00000000 b7007220 b7007254 00000000 00000000 ed07dfa8
| [  540.067412] dfa0: c000ede0 c00caa28 b7007220 b7007254 b7007254 00000800 b7006000 000254b8
| [  540.076003] dfc0: b7007220 b7007254 00000000 00000081 b7007254 00000001 b7007008 b70072b0
| [  540.084595] dfe0: b6f31420 be99b76c b6feff98 b6f3142c 60000010 b7007254 ed064e2b 50b60016
| [  540.093219] [<c004eba8>] (release_resource) from [<c043b670>] (platform_device_del+0x7c/0xac)
| [  540.102181] [<c043b670>] (platform_device_del) from [<c043b6bc>] (platform_device_unregister+0x1c/0x30)
| [  540.112048] [<c043b6bc>] (platform_device_unregister) from [<c0577f78>] (of_platform_device_destroy+0x8c/0x98)
| [  540.122557] [<c0577f78>] (of_platform_device_destroy) from [<c0435ac4>] (device_for_each_child+0x54/0x80)
| [  540.132612] [<c0435ac4>] (device_for_each_child) from [<c0577ee4>] (of_platform_depopulate+0x20/0x28)
| [  540.142312] [<c0577ee4>] (of_platform_depopulate) from [<bf013310>] (dwc3_omap_remove+0x58/0x78 [dwc3_omap])
| [  540.152634] [<bf013310>] (dwc3_omap_remove [dwc3_omap]) from [<c043b010>] (platform_drv_remove+0x28/0x2c)
| [  540.162665] [<c043b010>] (platform_drv_remove) from [<c04394a8>] (__device_release_driver+0x80/0xd4)
| [  540.172233] [<c04394a8>] (__device_release_driver) from [<c0439c74>] (driver_detach+0xc4/0xc8)
| [  540.181251] [<c0439c74>] (driver_detach) from [<c04391a4>] (bus_remove_driver+0x5c/0xb0)
| [  540.189750] [<c04391a4>] (bus_remove_driver) from [<c043a4e4>] (driver_unregister+0x38/0x58)
| [  540.198601] [<c043a4e4>] (driver_unregister) from [<c043b7c4>] (platform_driver_unregister+0x1c/0x20)
| [  540.208274] [<c043b7c4>] (platform_driver_unregister) from [<bf013930>] (dwc3_omap_driver_exit+0x14/0x1c [dwc3_omap])
| [  540.219407] [<bf013930>] (dwc3_omap_driver_exit [dwc3_omap]) from [<c00cab3c>] (SyS_delete_module+0x120/0x1b0)
| [  540.229943] [<c00cab3c>] (SyS_delete_module) from [<c000ede0>] (ret_fast_syscall+0x0/0x48)
| [  540.238617] Code: e1a04000 e59f006c eb19da12 e5943010 (e5932018) 
| [  540.245128] ---[ end trace ee0e6e3f9c9ba6ac ]---
| [  540.249985] note: modprobe[1878] exited with preempt_count 1
| Segmentation fault
| # 

FYI, PC dies at line 241 on kernel/resource.c:

| (gdb) l *(release_resource + 0x24)
| 0xc004eba8 is in release_resource (kernel/resource.c:241).
| 236     {
| 237             struct resource *tmp, **p;
| 238
| 239             p = &old->parent->child;
| 240             for (;;) {
| 241                     tmp = *p;
| 242                     if (!tmp)
| 243                             break;
| 244                     if (tmp == old) {
| 245                             *p = tmp->sibling;

Based on that, either old->parent or old->parent->child is NULL. But
considering that that virtual address is 0x18 (24 bytes offset) that
would be, if I can calculate correctly, the child offset inside parent.

So parent is NULL and NULL->child = 0x18.

cheers

-- 
balbi

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux