Re: ohci: sporadic crash/lockup in ohci-hcd io_watchdog_func()

Heiko Przybyl <lil_tux@xxxxxx> · Tue, 20 Jan 2015 21:09:34 +0100

On Tuesday 20 January 2015 10:49:29 Alan Stern wrote:
> On Mon, 19 Jan 2015, Heiko Przybyl wrote:
> > On Monday 19 January 2015 11:17:59 Alan Stern wrote:
> > > 
> > > That's easy enough to test.  All you have to do is change the
> > > spin_lock/unlock statements to their irq_save/restore variants.
> > 
> > Well, thought about that as well, but I'm not sure when to take it as
> > fixed and when to take it as issue-just-didn't-happen-yet, because of the
> > not-so- deterministic occurrence of the error. But I can try it out
> > anyway, just wanted to have some feedback before trying.
> 
> By the way, failing to disable interrupts when acquiring a spinlock
> generally does not lead to data corruption -- it leads to deadlocks.
> So I doubt this is the cause of your problem.  If you really want to,
> you could add a
> 
> 	WARN_ON(!irqs_disabled());
> 
> line to ohci_irq().

You're right. After thinking/reading a bit more about the topic, not disabled 
IRQs would cause deadlocks, because there's already one thread in the critical 
section.

> 
> No idea.
> 
> It might be a good idea for you to try something a little more
> invasive.  How about writing a routine to check the entire
> ohci->eds_in_use list for validity (each forward pointer is matched by
> the corresponding backward pointer), and calling this routine at each
> place where the list gets modified, before the modification happens?
> 
> You could also make sure that an entry being added to the list isn't on
> the list already, and whenever an entry is deleted from the list
> either it really is on the list or else its list pointers point to
> themselves.
> 

I'm not 100% sure, but then it's probably a race between urb 
enqueuing (duplicates?) and the watchdog orphan cleanup.

The crash log already shows the double list add in ohci_urb_enqueue
"
ohci-hcd.c:238: list_add(&ed->in_use_list, &ohci->eds_in_use);
"
This is probably due to the ed returned by ed_get() being reused before the
watchdog ran, thus the same in_use_list re-added to ohci.eds_in_use.

Entries seem to get removed in finish_unlinks()
"
ohci-q.c:1090: list_del(&ed->in_use_list);
"
with list_del() poisoning the next/prev pointers of the removed entry.

Now with the watchdog starting cleanup it iterates over the ohci.eds_in_use 
list
that still has the second very same entry of in_use_list we double-added (but 
now with 0xdead... pointers) and we fault on 
"
ohci-hcd.c:761: if (ed->pending_td) {
"

I hope that makes any sense? I'll hook up the list checking tomorrow. Though I 
haven't hit the (double-add) problem again, since the bug report. Seems pretty 
specific the whole thing.

> Alan Stern

Heiko
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html