EHCI: iaa_watchdog_start() warning followed by NULL ptr dereference in start_unlink_async()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I came across an issue where I see WARN_ON from iaa_watchdog_start() and
after almost 10ms I see NULL ptr dereference in start_unlink_async()
 It happens exactly here
	prev = ehci->async;
=>	while (prev->qh_next.qh != qh)
		prev = prev->qh_next.qh;

Here is the call stack trace when warning shows up and then call stack
trace for NULL ptr dereference

<4>[12-07-25 15:38:29.438] WARNING: at /kernel/drivers/usb/host/ehci.h:191
start_unlink_async+0x1cc/0x1f8()
<4>[12-07-25 15:38:29.438] [<c010c694>] (unwind_backtrace+0x0/0x12c) from
[<c0186a10>] (warn_slowpath_common+0x4c/0x64)
<4>[12-07-25 15:38:29.438] [<c0186a10>] (warn_slowpath_common+0x4c/0x64)
from [<c0186a40>] (warn_slowpath_null+0x18/0x1c)
<4>[12-07-25 15:38:29.438] [<c0186a40>] (warn_slowpath_null+0x18/0x1c)
from [<c04c76c8>] (start_unlink_async+0x1cc/0x1f8)
<4>[12-07-25 15:38:29.438] [<c04c76c8>] (start_unlink_async+0x1cc/0x1f8)
from [<c04c74a0>] (end_unlink_async+0x1b4/0x210)
<4>[12-07-25 15:38:29.438] [<c04c74a0>] (end_unlink_async+0x1b4/0x210)
from [<c04cd244>] (ehci_irq+0x1a4/0x4d8)
<4>[12-07-25 15:38:29.438] [<c04cd244>] (ehci_irq+0x1a4/0x4d8) from
[<c04ad3bc>] (usb_hcd_irq+0x30/0x80)
<4>[12-07-25 15:38:29.438] [<c04ad3bc>] (usb_hcd_irq+0x30/0x80) from
[<c01c8ab0>] (handle_irq_event_percpu+0x9c/0x244)
<4>[12-07-25 15:38:29.438] [<c01c8ab0>]
(handle_irq_event_percpu+0x9c/0x244) from [<c01c8c94>]
(handle_irq_event+0x3c/0x5c)
<4>[12-07-25 15:38:29.438] [<c01c8c94>] (handle_irq_event+0x3c/0x5c) from
[<c01cb97c>] (handle_fasteoi_irq+0xd0/0x108)
<4>[12-07-25 15:38:29.438] [<c01cb97c>] (handle_fasteoi_irq+0xd0/0x108)
from [<c01c8590>] (generic_handle_irq+0x28/0x3c)
<4>[12-07-25 15:38:29.438] [<c01c8590>] (generic_handle_irq+0x28/0x3c)
from [<c0106f08>] (handle_IRQ+0x7c/0xc0)
<4>[12-07-25 15:38:29.438] [<c0106f08>] (handle_IRQ+0x7c/0xc0) from
[<c0100410>] (gic_handle_irq+0xac/0x104)
<4>[12-07-25 15:38:29.438] [<c0100410>] (gic_handle_irq+0xac/0x104) from
[<c08106d4>] (__irq_svc+0x54/0x80)


<1>[12-07-25 15:38:29.448] Unable to handle kernel NULL pointer
dereference at virtual address 00000008
<1>[12-07-25 15:38:29.448] pgd = c0004000
<1>[12-07-25 15:38:29.448] [00000008] *pgd=00000000
<0>[12-07-25 15:38:29.448] Internal error: Oops: 17 [#1] PREEMPT SMP
<4>[12-07-25 15:38:29.448] Modules linked in: wlan(P) cfg80211 mwlan_aarp(P)
<4>[12-07-25 15:38:29.448] CPU: 1    Tainted: P        W    (3.0.21 #1)
<4>[12-07-25 15:38:29.448] PC is at start_unlink_async+0xf0/0x1f8
<4>[12-07-25 15:38:29.448] LR is at start_unlink_async+0x1c/0x1f8
<4>[12-07-25 15:38:29.448] pc : <c04c75ec>    lr : <c04c7518>    psr:
00000093

<4>[12-07-25 15:38:29.448] [<c04c75ec>] (start_unlink_async+0xf0/0x1f8)
from [<c04cb384>] (ehci_urb_dequeue+0x84/0x110)
<4>[12-07-25 15:38:29.448] [<c04cb384>] (ehci_urb_dequeue+0x84/0x110) from
[<c04af690>] (unlink1+0xc4/0xd4)
<4>[12-07-25 15:38:29.448] [<c04af690>] (unlink1+0xc4/0xd4) from
[<c04af868>] (usb_hcd_unlink_urb+0x5c/0xc4)
<4>[12-07-25 15:38:29.448] [<c04af868>] (usb_hcd_unlink_urb+0x5c/0xc4)
from [<c04afd44>] (usb_kill_urb+0x4c/0xec)
<4>[12-07-25 15:38:29.448] [<c04afd44>] (usb_kill_urb+0x4c/0xec) from
[<c04b03b4>] (usb_kill_anchored_urbs+0x30/0x58)
<4>[12-07-25 15:38:29.448] [<c04b03b4>] (usb_kill_anchored_urbs+0x30/0x58)
from [<c04e6ea8>] (bridge_suspend+0x4c/0x5c)
<4>[12-07-25 15:38:29.448] [<c04e6ea8>] (bridge_suspend+0x4c/0x5c) from
[<c04b2eec>] (usb_suspend_both+0x7c/0x1c4)
<4>[12-07-25 15:38:29.448] [<c04b2eec>] (usb_suspend_both+0x7c/0x1c4) from
[<c04b3060>] (usb_runtime_suspend+0x2c/0x50)
<4>[12-07-25 15:38:29.448] [<c04b3060>] (usb_runtime_suspend+0x2c/0x50)
from [<c042d8dc>] (rpm_callback+0x44/0x5c)
<4>[12-07-25 15:38:29.448] [<c042d8dc>] (rpm_callback+0x44/0x5c) from
[<c042ddc4>] (rpm_suspend+0x29c/0x4c0)
<4>[12-07-25 15:38:29.448] [<c042ddc4>] (rpm_suspend+0x29c/0x4c0) from
[<c042eda0>] (pm_runtime_work+0x7c/0x98)
<4>[12-07-25 15:38:29.448] [<c042eda0>] (pm_runtime_work+0x7c/0x98) from
[<c019e8e8>] (process_one_work+0x2bc/0x494)
<4>[12-07-25 15:38:29.448] [<c019e8e8>] (process_one_work+0x2bc/0x494)
from [<c019ee9c>] (worker_thread+0x224/0x3e0)
<4>[12-07-25 15:38:29.448] [<c019ee9c>] (worker_thread+0x224/0x3e0) from
[<c01a483c>] (kthread+0x80/0x88)
<4>[12-07-25 15:38:29.448] [<c01a483c>] (kthread+0x80/0x88) from
[<c0106fa0>] (kernel_thread_exit+0x0/0x8)
<0>[12-07-25 15:38:29.448] Code: e5853024 e5943014 e584501c e1a02003
(e5933008)
<4>[12-07-25 15:38:29.528] ---[ end trace da227214a82491ba ]---
<0>[12-07-25 15:38:29.528] Kernel panic - not syncing: Fatal exception

Which looks to me that qh that we are trying to unlink is not part for the
async list maintained by ehci. Here is the status of ehci_hcd struct at
the time of crash

-006|start_unlink_async(
    |    ehci = 0xDE70A948 -> (
    |      caps = 0xE1FDE100,
    |      regs = 0xE1FDE140,
    |      debug = 0x0,
    |      hcs_params = 65553,
    |      lock = (rlock = (raw_lock = (lock = 2997727918))),
    |      async = 0xDCAA4BC0,
    |      dummy = 0x0,
    |      reclaim = 0xDC9FC480,
    |      qh_scan_next = 0x0,
    |      scanning = 0,
    |      periodic_size = 512,
    |      periodic = 0xFFDE6000,
    |      periodic_dma = 2645143552,
    |      i_thresh = 2,
    |      pshadow = 0xDE714800,
    |      next_uframe = 2840,
    |      periodic_sched = 1,
    |      cached_itd_list = (next = 0xDE70A98C, prev = 0xDE70A98C),
    |      cached_sitd_list = (next = 0xDE70A994, prev = 0xDE70A994),
    |      clock_frame = 355,
    |      reset_done = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
    |      bus_suspended = 1,
    |      companion_ports = 0,
    |      owned_ports = 0,
    |      port_c_suspend = 0,
    |      suspended_ports = 0,
    |      qh_pool = 0xDE5C7080,
    |      qtd_pool = 0xDE5C7100,
    |      itd_pool = 0xDE5C7000,
    |      sitd_pool = 0xDE5C7480,
    |      iaa_watchdog = (entry = (next = 0x0, prev = 0x00200200),
expires = 2522904, base = 0xCF61A000, f
    |      watchdog = (entry = (next = 0xC0E23088, prev = 0xDCFD39D4),
expires = 2522911, base = 0xC0E22F80
    |      actions = 1,
    |      periodic_stamp = 284245,
    |      random_frame = 6,
    |      next_statechange = 2522629,
    |      last_periodic_enable = (tv64 = 1343172791085200785),
    |      command = 4197124,
    |      max_log2_irq_thresh = 6,
    |      no_selective_suspend = 0,
    |      has_fsl_port_bug = 0,
    |      big_endian_mmio = 0,
    |      big_endian_desc = 0,
    |      big_endian_capbase = 0,
    |      has_amcc_usb23 = 0,
    |      need_io_watchdog = 1,
    |      broken_periodic = 0,
    |      amd_pll_fix = 0,
    |      fs_i_thresh = 0,
    |      use_dummy_qh = 0,
    |      has_synopsys_hc_bug = 0,
    |      frame_index_bug = 0,
    |      susp_sof_bug = 1,
    |      resume_sof_bug = 1,
    |      ohci_hcctrl_reg = 0x0,
    |      has_hostpc = 0,
    |      has_lpm = 0,
    |      has_ppcd = 0,
    |      sbrn = 32,
    |      transceiver = 0x0),
    |    qh = 0xDC9FC480)<= qh we are trying to unlink.
    |  prev = 0x0<= prev is NULL
    |  __a = 4197173
    |  _ret = 0

This issue was reported when interface suspend happened as a result of
runtime suspend and our bridge driver called usb_kill_anchored_urbs().
Bridge driver queues 50 rx URBs when it resumes and unlinks them during
suspend.

This issue is very hard to reproduce (takes around week's time to show
up). So I was trying to analyze it statically based on the ram dump but
couldn?t figure out of a code path which can show this behavior.

Can someone please provide some pointers which can cause this issue to
happen or if this is something known ?

Appreciate your suggestions.

Thanks,
Hemant



Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm
Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by
The Linux Foundation.


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux