Suspected (out of tree) HCI issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

(please keep me cc'ed in replies, I'm not subscribed)

I'm trying OpenWRT on a DSL modem[1] which has an out-of-tree HCI
driver. I'm getting one particular process to hang in D 100% of time
when talking to a 3G USB modem. I'm on current default OpenWRT kernel,
3.14.18 .

Here is the hung process:

root@OpenWrt:/# echo w > /proc/sysrq-trigger
[   98.160000] SysRq : Show Blocked State
[   98.160000]   task                PC stack   pid father
[   98.160000] gcom            D 8030861c     0  2228   2188 0x08100020
[   98.160000] Stack : 81bde880 00000001 00000000 80307b18 81bde880 80435698 81bde88c 80414c60
	  81d213c0 00010000 80410000 81bb7180 00000001 8030861c 81bcd480 81452e00
	  81452e00 803a0000 00000000 814bf070 8007fc04 80435698 80435698 81452e00
	  803a0000 00000004 81d21380 00000010 81be2800 81828900 00000000 81d08c28
	  00000000 00000000 81be2804 81be2884 81452e00 803a0000 00000000 81d08c28
	  ...
[   98.160000] Call Trace:
[   98.160000] [<8000d954>] __schedule+0x4cc/0x578
[   98.160000] [<8030861c>] usb_kill_urb+0xe0/0x164
[   98.160000] [<81828900>] usb_wwan_close+0x100/0x2e4 [usb_wwan]
[   98.160000] [<802b6e8c>] tty_port_shutdown+0xc0/0xdc
[   98.160000] [<802b64ac>] tty_port_close+0x30/0xa4
[   98.160000] [<802b7918>] tty_release+0x168/0x524
[   98.160000] [<80041038>] __fput+0xf8/0x274
[   98.160000] [<80289560>] task_work_run+0xf0/0x128
[   98.160000] [<800ef3d0>] do_exit+0x3fc/0x858
[   98.160000] [<800f154c>] do_group_exit+0x78/0xb4
[   98.160000] [<800198a8>] SyS_faccessat+0x0/0x250

As can be seen, it is while cleaning up process resources on exit (at
kernel level) that the hang happens. If I unplug the modem, process
exits and I get the following output:
[ 2862.228000] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
[ 2862.236000] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
[ 2862.244000] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
[ 2862.252000] option1 ttyUSB0: usb_wwan_indat_callback: resubmit read urb failed. (-19)
[ 2862.260000] option1 ttyUSB0: option_instat_callback: error -150

Here is the HCD driver source code:
  http://git.openwrt.org/?p=openwrt.git;a=tree;f=package/kernel/lantiq/ltq-hcd/src;h=926980738d4eb39d78fdf116fe4f2f787040fa1d;hb=fee2d7ec702586850e0eafbd82f43c851d1c45b3
(to the revision I'm at locally, which is 2 days old as I'm writing
this, and no change since touch this path)

Poking around with printk debugging (*cough*), I found that the URB
__schedule is waiting upon has a use_count of 1. Adding printks after
the 2 atomic_dec and the only atomic_inc on it, I see this URB's history
as:

[   99.284000] 81ae2900->use_count 1 drivers/usb/core/hcd.c:1553
[   99.916000] 81ae2900->use_count 2 drivers/usb/core/hcd.c:1553
[   99.916000] 81ae2900->use_count 1 drivers/usb/core/hcd.c:1691
[  100.332000] 81ae2900->use_count 2 drivers/usb/core/hcd.c:1553
[  100.332000] 81ae2900->use_count 1 drivers/usb/core/hcd.c:1691
[  101.436000] IFXUSB: 81ae2900 urbd->phase 0
[  101.440000] 81ae2900->use_count 2 drivers/usb/core/hcd.c:1553
[  101.440000] 81ae2900->use_count 1 drivers/usb/core/hcd.c:1691
[  101.512000] wait_event 81ae2900

1553 is right after usb_hcd_submit_urb's atomic_inc
1691 is right after __usb_hcd_giveback_urb's atomic_dec
usb_hcd_submit_urb's atomic_dec is not hit.

The IFXUSB line is the HCD driver's ifxhcd_urb_dequeue telling it is
dequeueing the URB, currently in given phase (here 0, so URBD_IDLE).
wait_event line is usb_kill_urb calling wait_event or given URB.
This URB does not appear at any time before 99.284, when I started gcom.

Is it legal for use_count to exceed 1 ?
What would it mean ?
Eyeballing usb_wwan I could not see any obvious issue. I saw a large
patch set for this driver since 3.14.18 and copied it over (as of
linux-stable b0a9aa6da8088b722326a858ab572a13b5b6f9cb), with the only
visible improvement of not getting errors when unplugging the modem on
stuck process.
Is my debugging approach even reliable ? I'm reading use_count right
after an atomic_inc, which looks like a bad idea...
Any idea on what I should search for/look at next ?

I tried calling dump_stack after 1553, but kernel fails booting (maybe
URBs timing out just because of printks going through 115200 bauds
serial link ?). After ~260 seconds the router reboots (no error
message, maybe out of memory because of message buffer growing faster
than serial empties it ?).
If I only call it when count reaches 2, kernel boots and gives
stack traces I do not understand, such as (extracted from above run):

[   99.916000] CPU: 0 PID: 1413 Comm: logd Not tainted 3.14.18 #38
[   99.916000] Stack : 00000000 00000000 00000000 00000000 8095cdbe 00000033 838bd7e8 803f0000
          803c1644 8042136f 00000585 80953b00 838bd7e8 803f0000 81d395a8 00000020
          0000000a 802f9998 00000000 80211878 00000000 00000000 803c49a4 838b5ccc
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 838b5c58
          ...
[   99.916000] Call Trace:
[   99.916000] [<80260c70>] show_stack+0x48/0x70
[   99.916000] [<803071d0>] usb_hcd_submit_urb+0xdc/0xa24
[   99.916000] [<8194aac0>] usb_wwan_close+0x298/0x2e8 [usb_wwan]
[   99.916000] [<80066610>] __wake_up+0x28/0x48
[   99.916000] [<81d3467c>] ifxhcd_hc_cleanup+0xae8/0xba4 [ltq_hcd_vr9]

[  100.332000] CPU: 0 PID: 973 Comm: kworker/0:2 Not tainted 3.14.18 #38
[  100.332000] Workqueue: events usb_serial_suspend [usbserial]
[  100.332000] Stack : 818c8680 803c1644 80414c60 81ae2908 803f0000 803f0000 81d395a8 00000020
          0000000a 802f9998 000003cd 80211ffc 00000000 00000000 803e2e00 818bdb34
          818bdb34 802f9998 00000000 80211aa4 00000000 000005f2 000005f2 8094d7e0
          00000000 00000000 00000000 00000000 00000000 00000000 6576656e 74730000
          00000000 00000000 00000000 00000000 8381bb80 81085500 818507d4 818bdae0
          ...
[  100.332000] Call Trace:
[  100.332000] [<80260c70>] show_stack+0x48/0x70
[  100.332000] [<803071d0>] usb_hcd_submit_urb+0xdc/0xa24
[  100.332000] [<8194aac0>] usb_wwan_close+0x298/0x2e8 [usb_wwan]

[  101.440000] CPU: 0 PID: 2254 Comm: gcom Not tainted 3.14.18 #38
[  101.440000] Stack : 00000000 00000000 00000000 00000000 8095cdbe 00000033 81507748 803f0000
          803c1644 8042136f 000008ce 80953b00 81507748 803f0000 80410000 00000020
          7fc48cf0 802f9998 00000000 80211878 00000000 00000000 803c49a4 814c3b14
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 814c3aa0
          ...
[  101.440000] Call Trace:
[  101.440000] [<80260c70>] show_stack+0x48/0x70
[  101.440000] [<803071d0>] usb_hcd_submit_urb+0xdc/0xa24
[  101.440000] [<8194aac0>] usb_wwan_close+0x298/0x2e8 [usb_wwan]

[1] http://wiki.openwrt.org/toh/tp-link/td-w8970

Regards,
-- 
Vincent Pelletier
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux