On Tuesday 20 January 2015 10:49:29 Alan Stern wrote: > On Mon, 19 Jan 2015, Heiko Przybyl wrote: > > On Monday 19 January 2015 11:17:59 Alan Stern wrote: > > > > > > That's easy enough to test. All you have to do is change the > > > spin_lock/unlock statements to their irq_save/restore variants. > > > > Well, thought about that as well, but I'm not sure when to take it as > > fixed and when to take it as issue-just-didn't-happen-yet, because of the > > not-so- deterministic occurrence of the error. But I can try it out > > anyway, just wanted to have some feedback before trying. > > By the way, failing to disable interrupts when acquiring a spinlock > generally does not lead to data corruption -- it leads to deadlocks. > So I doubt this is the cause of your problem. If you really want to, > you could add a > > WARN_ON(!irqs_disabled()); > > line to ohci_irq(). You're right. After thinking/reading a bit more about the topic, not disabled IRQs would cause deadlocks, because there's already one thread in the critical section. > > No idea. > > It might be a good idea for you to try something a little more > invasive. How about writing a routine to check the entire > ohci->eds_in_use list for validity (each forward pointer is matched by > the corresponding backward pointer), and calling this routine at each > place where the list gets modified, before the modification happens? > > You could also make sure that an entry being added to the list isn't on > the list already, and whenever an entry is deleted from the list > either it really is on the list or else its list pointers point to > themselves. > I'm not 100% sure, but then it's probably a race between urb enqueuing (duplicates?) and the watchdog orphan cleanup. The crash log already shows the double list add in ohci_urb_enqueue " ohci-hcd.c:238: list_add(&ed->in_use_list, &ohci->eds_in_use); " This is probably due to the ed returned by ed_get() being reused before the watchdog ran, thus the same in_use_list re-added to ohci.eds_in_use. Entries seem to get removed in finish_unlinks() " ohci-q.c:1090: list_del(&ed->in_use_list); " with list_del() poisoning the next/prev pointers of the removed entry. Now with the watchdog starting cleanup it iterates over the ohci.eds_in_use list that still has the second very same entry of in_use_list we double-added (but now with 0xdead... pointers) and we fault on " ohci-hcd.c:761: if (ed->pending_td) { " I hope that makes any sense? I'll hook up the list checking tomorrow. Though I haven't hit the (double-add) problem again, since the bug report. Seems pretty specific the whole thing. > Alan Stern Heiko -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html