Re: ohci: sporadic crash/lockup in ohci-hcd io_watchdog_func()

Heiko Przybyl <lil_tux@xxxxxx> · Mon, 19 Jan 2015 19:31:56 +0100

On Monday 19 January 2015 11:17:59 Alan Stern wrote:
> On Mon, 19 Jan 2015, Heiko Przybyl wrote:
> > It seems to be related to keyboard input (at least it happens when using
> > the keyboard), without relation to system load. Can happen within a day
> > after boot or after several days of hibernated uptime. Unfortunately, I
> > haven't found a way to reliably reproduce the issue, yet.
> > 
> > [..]
> > 
> > My (pretty wild) guess is, that the corruption happens through a race in
> > the interrupt handler ohci_irq(), which calls ohci_work(), which calls
> > finish_urb(), which states:
> > " * PRECONDITION:  ohci lock held, irqs blocked"
> > 
> > But ohci_irq() seems to only spin_[un]lock(), not spin_[un]lock_irq[save|
> > restore](). All other functions that call ohci_work() do at least
> > spin_[un]lock_irq. So irqs could still be enabled and possibly the event
> > triggered twice, thus the double list add?
> 
> That's easy enough to test.  All you have to do is change the
> spin_lock/unlock statements to their irq_save/restore variants.

Well, thought about that as well, but I'm not sure when to take it as fixed and 
when to take it as issue-just-didn't-happen-yet, because of the not-so-
deterministic occurrence of the error. But I can try it out anyway, just 
wanted to have some feedback before trying.

> 
> ohci_irq() is an interrupt handler.  In the absence of threaded IRQs,
> he kernel should always call interrupt handlers with interrupts
> disabled.  Do you specify "threadirqs" on your boot command line?
> 

Never used "threadirqs".

# cat /proc/cmdline 
BOOT_IMAGE=/boot/gentoo root=/dev/sda2 ro rootfstype=ext4 resume=/dev/sda3 
init=/usr/lib/systemd/systemd quiet libahci.ignore_sss=1 i8042.nopnp 
crashkernel=64M

> If that's not the explanation then we'll have to dig deeper.

I can still work on a saved vmcore dump of a crash. Btw. using crash(1) and 
its command `bt -E`shows two kernel mode exceptions. Though, I can't figure out 
where the first one originates from

CPU 3 IRQ STACK:
  KERNEL-MODE EXCEPTION FRAME AT: ffff88022ecc3638
    [exception RIP: _raw_spin_unlock_irqrestore+9]
    RIP: ffffffff814774b9  RSP: ffff88022ecc36e8  RFLAGS: 00000202
    RAX: ffff88022ecc36a8  RBX: ffff88022ecc36b0  RCX: ffffffff81290279
    RDX: 0000000000002dff  RSI: 0000000000000000  RDI: ffff88022ecc3788
    RBP: ffff88022ecc36e8   R8: 0000000000000080   R9: 0000000000000023
    R10: ffffffff813e6407  R11: ffffea000863ad80  R12: ffff88022ecc3658
    R13: ffffffff81478b2a  R14: ffff88022ecc36e8  R15: 0000000000000001
    ORIG_RAX: ffffffff81471cfd  CS: 0010  SS: 0018

    0xffffffff814774b9 <+9>:     decl   %gs:0xa860

CPU 5 IRQ STACK:
  KERNEL-MODE EXCEPTION FRAME AT: ffff88022ed43d98
    [exception RIP: io_watchdog_func+112]
    RIP: ffffffff81394b80  RSP: ffff88022ed43e48  RFLAGS: 00010006
    RAX: ffff8800cb8aa598  RBX: 0000000000000296  RCX: ffff8800cbaa8030
    RDX: dead000000100100  RSI: 00000000cbaa91e0  RDI: ffff8800cbaa8030
    RBP: ffff88022ed43e88   R8: ffff8800cbaa7fe8   R9: 0000000000000205
    R10: ffff8800cbaa8030  R11: ffff8800cb8aa5a0  R12: dead0000001000c0
    R13: ffff8800cb8aa248  R14: ffff8800cb8aa5b8  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000

> Alan Stern

Kind regards,

   Heiko
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html