Hello again. The issue still bites me with 4.0.2. Though I was able to narrow down the issue, it's still hard to deterministically trigger it. Anyway. The setup: - OHCI controller with hooked up keyboard - xhddled (program that toggles scrolllock on disk activity) - disk activity I'm very certain that it happens, when usbhid issues an urb for scrolllock led toggling, the ohci driver adding a duplicate to its eds_in_use list. That's why I hit it way more often with disk activity. I also tried to alter the program to just periodically toggle the scrolllock, but that didn't seem to help for reproducibility. On Friday 23 January 2015 15:21:11 Alan Stern wrote: > On Tue, 20 Jan 2015, Heiko Przybyl wrote: > > I'm not 100% sure, but then it's probably a race between urb > > enqueuing (duplicates?) and the watchdog orphan cleanup. > > > > The crash log already shows the double list add in ohci_urb_enqueue > > " > > ohci-hcd.c:238: list_add(&ed->in_use_list, &ohci->eds_in_use); > > " > > This is probably due to the ed returned by ed_get() being reused before > > the > > watchdog ran, thus the same in_use_list re-added to ohci.eds_in_use. > > But this won't happen unless ed->state was ED_IDLE; and then > ed_schedule() changes ed->state to ED_OPER before ed is added to the > in_use_list. As suggested, checking the eds_in_use list was a good idea. Now, I'm even more convinced duplicates are being added to the eds_in_use list. Check right before the list_add in ohci_urb_enqueue(): [73942.398966] 12.0: oc(pre add): elem ffff88009faff040 not expected but FOUND [73942.407393] 12.0: e ffff88009faff040 e.prev ffff8800c9c845b8 e.next ffff88009faff120 [73942.416710] 12.0: e ffff88009faff120 e.prev ffff88009faff040 e.next ffff88009faff0b0 [73942.426044] 12.0: e ffff88009faff0b0 e.prev ffff88009faff120 e.next ffff8800c9c845b8 [73942.436752] 12.0: oc(pre add): EOL [73942.675303] Call Trace: [73942.677779] <IRQ> [73942.679755] [<ffffffffa001bad0>] ohci_check+0x250/0x250 [ohci_hcd] [73942.686313] [<ffffffffa00215ac>] ohci_urb_enqueue+0x84c/0xb9a [ohci_hcd] [73942.693127] [<ffffffff810ac079>] ? select_task_rq_fair+0x469/0x7f0 [73942.699489] [<ffffffff813ba4f9>] usb_hcd_submit_urb+0xa1/0xae0 [73942.705427] [<ffffffff810afb5d>] ? update_group_capacity+0x24d/0x278 [73942.711929] [<ffffffff8128e08f>] ? cpumask_next_and+0x2f/0x40 [73942.717791] [<ffffffff813bc246>] usb_submit_urb+0x42e/0x5e8 [73942.723491] [<ffffffffa01ef6ab>] hid_submit_ctrl+0x12b/0x290 [usbhid] [73942.730042] [<ffffffffa01f07c3>] hid_ctrl+0xeb/0x170 [usbhid] [73942.735929] [<ffffffff813b897c>] __usb_hcd_giveback_urb+0x64/0xe8 [73942.742165] [<ffffffff813b8b0e>] usb_hcd_giveback_urb+0x3e/0x130 [73942.748267] [<ffffffffa001b734>] finish_urb+0x74/0x158 [ohci_hcd] [73942.754459] [<ffffffffa001d1e4>] ohci_work.part.4+0x26c/0x5f8 [ohci_hcd] [73942.761272] [<ffffffffa0020c60>] ohci_irq+0x218/0x318 [ohci_hcd] [73942.767394] [<ffffffff813b8399>] usb_hcd_irq+0x21/0x38 [73942.772653] [<ffffffff810c7048>] handle_irq_event_percpu+0x60/0x108 [73942.779029] [<ffffffff810c712c>] handle_irq_event+0x3c/0x60 [73942.784718] [<ffffffff810c9d21>] handle_fasteoi_irq+0x81/0x130 [73942.790675] [<ffffffff81004acd>] handle_irq+0x1d/0x40 [73942.795844] [<ffffffff810044aa>] do_IRQ+0x4a/0xe0 [73942.800669] [<ffffffff814c28aa>] common_interrupt+0x6a/0x6a When checking after the list_add, there's a loop in the linked list, making it endless and probably locking the machine. 12.0: e ffff8800c9993120 e.prev ffff8800cbb065b8 e.next ffff8800c9993120 12.0: list corruption: p(ffff8800c9993120) != e.prev(ffff8800cbb065b8) [21042.467670] Call Trace: [21042.470162] <IRQ> [21042.472168] [<ffffffffa0056646>] ohci_urb_enqueue+0x876/0xb9a [ohci_hcd] [21042.479327] [<ffffffff810b63ff>] ? __wake_up_sync_key+0x4f/0x60 [21042.485474] [<ffffffff810ac079>] ? select_task_rq_fair+0x469/0x7f0 [21042.491846] [<ffffffff813ba4f9>] usb_hcd_submit_urb+0xa1/0xae0 [21042.497873] [<ffffffff810b5f73>] ? __wake_up+0x43/0x58 [21042.503231] [<ffffffff813712c8>] ? rpm_idle+0x18/0x148 [21042.508530] [<ffffffff813bc246>] usb_submit_urb+0x42e/0x5e8 [21042.514302] [<ffffffffa00e46ab>] hid_submit_ctrl+0x12b/0x290 [usbhid] [21042.520964] [<ffffffff810b5f73>] ? __wake_up+0x43/0x58 [21042.526300] [<ffffffffa00e57c3>] hid_ctrl+0xeb/0x170 [usbhid] [21042.532202] [<ffffffff813b897c>] __usb_hcd_giveback_urb+0x64/0xe8 [21042.538497] [<ffffffff813b8b0e>] usb_hcd_giveback_urb+0x3e/0x130 [21042.544689] [<ffffffffa0050734>] finish_urb+0x74/0x158 [ohci_hcd] [21042.551005] [<ffffffffa0052254>] ohci_work.part.4+0x26c/0x5f8 [ohci_hcd] [21042.557940] [<ffffffffa0055cd0>] ohci_irq+0x218/0x318 [ohci_hcd] [21042.564152] [<ffffffff813b8399>] usb_hcd_irq+0x21/0x38 [21042.569485] [<ffffffff810c7048>] handle_irq_event_percpu+0x60/0x108 [21042.575968] [<ffffffff810c712c>] handle_irq_event+0x3c/0x60 [21042.581727] [<ffffffff810c9d21>] handle_fasteoi_irq+0x81/0x130 [21042.587781] [<ffffffff81004acd>] handle_irq+0x1d/0x40 [21042.593027] [<ffffffff810044aa>] do_IRQ+0x4a/0xe0 [21042.597909] [<ffffffff814c28aa>] common_interrupt+0x6a/0x6a > > And finish_unlinks() is where ed->state gets set back to ED_IDLE. > Furthermore, all these operations take place under the protection of > ohci->lock, so they can't race with each other. > > They have to happen in strict sequence: list_add, list_del, list_add. > The second list_add can't occur until after the list_del, because > ed->state won't be equal to ED_IDLE until then. > > At least, that's how it's _supposed_ to work... Well, finish_urb() still does unlock the ohci->lock before calling usb_hcd_giveback_urb(), because the completion routine can reenter. It then locks again after that call. Unfortunately, usbhid's completion routine hid_ctrl() does infact issue another urb on the same ed. So I'd say there's a small time slot where things can go awkward, if the newly submitted urb arrives before the ohci lock gets locked again in finish_urb. What I'm driving at is something in the way of: finish_urb() _unlock() _usb_hcd_giveback_urb() __hid_ctrl() ___urb_enqueue() ____lock() ____list_add() ____unlock() _lock() list_del() Would it be feasible to run usb_hcd_giveback_urb() after the cleanup parts? > > It's a puzzling problem. I hope you can find the cause. Well, I'm puzzled and I hope I'm pointing at the right location :-) > > Alan Stern Kind regards, Heiko -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html