Hi I came across an issue where I see WARN_ON from iaa_watchdog_start() and after almost 10ms I see NULL ptr dereference in start_unlink_async() It happens exactly here prev = ehci->async; => while (prev->qh_next.qh != qh) prev = prev->qh_next.qh; Here is the call stack trace when warning shows up and then call stack trace for NULL ptr dereference <4>[12-07-25 15:38:29.438] WARNING: at /kernel/drivers/usb/host/ehci.h:191 start_unlink_async+0x1cc/0x1f8() <4>[12-07-25 15:38:29.438] [<c010c694>] (unwind_backtrace+0x0/0x12c) from [<c0186a10>] (warn_slowpath_common+0x4c/0x64) <4>[12-07-25 15:38:29.438] [<c0186a10>] (warn_slowpath_common+0x4c/0x64) from [<c0186a40>] (warn_slowpath_null+0x18/0x1c) <4>[12-07-25 15:38:29.438] [<c0186a40>] (warn_slowpath_null+0x18/0x1c) from [<c04c76c8>] (start_unlink_async+0x1cc/0x1f8) <4>[12-07-25 15:38:29.438] [<c04c76c8>] (start_unlink_async+0x1cc/0x1f8) from [<c04c74a0>] (end_unlink_async+0x1b4/0x210) <4>[12-07-25 15:38:29.438] [<c04c74a0>] (end_unlink_async+0x1b4/0x210) from [<c04cd244>] (ehci_irq+0x1a4/0x4d8) <4>[12-07-25 15:38:29.438] [<c04cd244>] (ehci_irq+0x1a4/0x4d8) from [<c04ad3bc>] (usb_hcd_irq+0x30/0x80) <4>[12-07-25 15:38:29.438] [<c04ad3bc>] (usb_hcd_irq+0x30/0x80) from [<c01c8ab0>] (handle_irq_event_percpu+0x9c/0x244) <4>[12-07-25 15:38:29.438] [<c01c8ab0>] (handle_irq_event_percpu+0x9c/0x244) from [<c01c8c94>] (handle_irq_event+0x3c/0x5c) <4>[12-07-25 15:38:29.438] [<c01c8c94>] (handle_irq_event+0x3c/0x5c) from [<c01cb97c>] (handle_fasteoi_irq+0xd0/0x108) <4>[12-07-25 15:38:29.438] [<c01cb97c>] (handle_fasteoi_irq+0xd0/0x108) from [<c01c8590>] (generic_handle_irq+0x28/0x3c) <4>[12-07-25 15:38:29.438] [<c01c8590>] (generic_handle_irq+0x28/0x3c) from [<c0106f08>] (handle_IRQ+0x7c/0xc0) <4>[12-07-25 15:38:29.438] [<c0106f08>] (handle_IRQ+0x7c/0xc0) from [<c0100410>] (gic_handle_irq+0xac/0x104) <4>[12-07-25 15:38:29.438] [<c0100410>] (gic_handle_irq+0xac/0x104) from [<c08106d4>] (__irq_svc+0x54/0x80) <1>[12-07-25 15:38:29.448] Unable to handle kernel NULL pointer dereference at virtual address 00000008 <1>[12-07-25 15:38:29.448] pgd = c0004000 <1>[12-07-25 15:38:29.448] [00000008] *pgd=00000000 <0>[12-07-25 15:38:29.448] Internal error: Oops: 17 [#1] PREEMPT SMP <4>[12-07-25 15:38:29.448] Modules linked in: wlan(P) cfg80211 mwlan_aarp(P) <4>[12-07-25 15:38:29.448] CPU: 1 Tainted: P W (3.0.21 #1) <4>[12-07-25 15:38:29.448] PC is at start_unlink_async+0xf0/0x1f8 <4>[12-07-25 15:38:29.448] LR is at start_unlink_async+0x1c/0x1f8 <4>[12-07-25 15:38:29.448] pc : <c04c75ec> lr : <c04c7518> psr: 00000093 <4>[12-07-25 15:38:29.448] [<c04c75ec>] (start_unlink_async+0xf0/0x1f8) from [<c04cb384>] (ehci_urb_dequeue+0x84/0x110) <4>[12-07-25 15:38:29.448] [<c04cb384>] (ehci_urb_dequeue+0x84/0x110) from [<c04af690>] (unlink1+0xc4/0xd4) <4>[12-07-25 15:38:29.448] [<c04af690>] (unlink1+0xc4/0xd4) from [<c04af868>] (usb_hcd_unlink_urb+0x5c/0xc4) <4>[12-07-25 15:38:29.448] [<c04af868>] (usb_hcd_unlink_urb+0x5c/0xc4) from [<c04afd44>] (usb_kill_urb+0x4c/0xec) <4>[12-07-25 15:38:29.448] [<c04afd44>] (usb_kill_urb+0x4c/0xec) from [<c04b03b4>] (usb_kill_anchored_urbs+0x30/0x58) <4>[12-07-25 15:38:29.448] [<c04b03b4>] (usb_kill_anchored_urbs+0x30/0x58) from [<c04e6ea8>] (bridge_suspend+0x4c/0x5c) <4>[12-07-25 15:38:29.448] [<c04e6ea8>] (bridge_suspend+0x4c/0x5c) from [<c04b2eec>] (usb_suspend_both+0x7c/0x1c4) <4>[12-07-25 15:38:29.448] [<c04b2eec>] (usb_suspend_both+0x7c/0x1c4) from [<c04b3060>] (usb_runtime_suspend+0x2c/0x50) <4>[12-07-25 15:38:29.448] [<c04b3060>] (usb_runtime_suspend+0x2c/0x50) from [<c042d8dc>] (rpm_callback+0x44/0x5c) <4>[12-07-25 15:38:29.448] [<c042d8dc>] (rpm_callback+0x44/0x5c) from [<c042ddc4>] (rpm_suspend+0x29c/0x4c0) <4>[12-07-25 15:38:29.448] [<c042ddc4>] (rpm_suspend+0x29c/0x4c0) from [<c042eda0>] (pm_runtime_work+0x7c/0x98) <4>[12-07-25 15:38:29.448] [<c042eda0>] (pm_runtime_work+0x7c/0x98) from [<c019e8e8>] (process_one_work+0x2bc/0x494) <4>[12-07-25 15:38:29.448] [<c019e8e8>] (process_one_work+0x2bc/0x494) from [<c019ee9c>] (worker_thread+0x224/0x3e0) <4>[12-07-25 15:38:29.448] [<c019ee9c>] (worker_thread+0x224/0x3e0) from [<c01a483c>] (kthread+0x80/0x88) <4>[12-07-25 15:38:29.448] [<c01a483c>] (kthread+0x80/0x88) from [<c0106fa0>] (kernel_thread_exit+0x0/0x8) <0>[12-07-25 15:38:29.448] Code: e5853024 e5943014 e584501c e1a02003 (e5933008) <4>[12-07-25 15:38:29.528] ---[ end trace da227214a82491ba ]--- <0>[12-07-25 15:38:29.528] Kernel panic - not syncing: Fatal exception Which looks to me that qh that we are trying to unlink is not part for the async list maintained by ehci. Here is the status of ehci_hcd struct at the time of crash -006|start_unlink_async( | ehci = 0xDE70A948 -> ( | caps = 0xE1FDE100, | regs = 0xE1FDE140, | debug = 0x0, | hcs_params = 65553, | lock = (rlock = (raw_lock = (lock = 2997727918))), | async = 0xDCAA4BC0, | dummy = 0x0, | reclaim = 0xDC9FC480, | qh_scan_next = 0x0, | scanning = 0, | periodic_size = 512, | periodic = 0xFFDE6000, | periodic_dma = 2645143552, | i_thresh = 2, | pshadow = 0xDE714800, | next_uframe = 2840, | periodic_sched = 1, | cached_itd_list = (next = 0xDE70A98C, prev = 0xDE70A98C), | cached_sitd_list = (next = 0xDE70A994, prev = 0xDE70A994), | clock_frame = 355, | reset_done = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), | bus_suspended = 1, | companion_ports = 0, | owned_ports = 0, | port_c_suspend = 0, | suspended_ports = 0, | qh_pool = 0xDE5C7080, | qtd_pool = 0xDE5C7100, | itd_pool = 0xDE5C7000, | sitd_pool = 0xDE5C7480, | iaa_watchdog = (entry = (next = 0x0, prev = 0x00200200), expires = 2522904, base = 0xCF61A000, f | watchdog = (entry = (next = 0xC0E23088, prev = 0xDCFD39D4), expires = 2522911, base = 0xC0E22F80 | actions = 1, | periodic_stamp = 284245, | random_frame = 6, | next_statechange = 2522629, | last_periodic_enable = (tv64 = 1343172791085200785), | command = 4197124, | max_log2_irq_thresh = 6, | no_selective_suspend = 0, | has_fsl_port_bug = 0, | big_endian_mmio = 0, | big_endian_desc = 0, | big_endian_capbase = 0, | has_amcc_usb23 = 0, | need_io_watchdog = 1, | broken_periodic = 0, | amd_pll_fix = 0, | fs_i_thresh = 0, | use_dummy_qh = 0, | has_synopsys_hc_bug = 0, | frame_index_bug = 0, | susp_sof_bug = 1, | resume_sof_bug = 1, | ohci_hcctrl_reg = 0x0, | has_hostpc = 0, | has_lpm = 0, | has_ppcd = 0, | sbrn = 32, | transceiver = 0x0), | qh = 0xDC9FC480)<= qh we are trying to unlink. | prev = 0x0<= prev is NULL | __a = 4197173 | _ret = 0 This issue was reported when interface suspend happened as a result of runtime suspend and our bridge driver called usb_kill_anchored_urbs(). Bridge driver queues 50 rx URBs when it resumes and unlinks them during suspend. This issue is very hard to reproduce (takes around week's time to show up). So I was trying to analyze it statically based on the ram dump but couldn?t figure out of a code path which can show this behavior. Can someone please provide some pointers which can cause this issue to happen or if this is something known ? Appreciate your suggestions. Thanks, Hemant Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html