Resume crash: MUSB interrupt routine interactions with omap2430_musb_set_vbus()

Tim Nordell <tim.nordell@xxxxxxxxxxx> · Thu, 06 Sep 2012 08:35:47 -0500

All -

We've been doing some suspend/resume testing and found that on occasion 
(on the order of 1 in 5000 cycles) the system would lock up.  The 
problem was traced into the MUSB subsystem.  Specifically, the interrupt 
requested inside musb_core.c is of the non-threaded type (e.g. it runs 
in the interrupt context).

...
        /* attach to the IRQ */
        if (request_irq(nIrq, musb->isr, 0, dev_name(dev), musb)) {
                dev_err(dev, "request_irq %d failed!\n", nIrq);
                status = -ENODEV;
                goto fail3;
        }
...

Later inside the interrupt context of the routine musb_stage0_irq() it 
has the following call:

...
        /* see manual for the order of the tests */
        if (int_usb & MUSB_INTR_SESSREQ) {
...
                musb_platform_set_vbus(musb, 1);
...
        }
...

which in turn calls

static void omap2430_musb_set_vbus(struct musb *musb, int is_on)
{
        struct usb_otg  *otg = musb->xceiv->otg;
        u8              devctl;
        unsigned long timeout = jiffies + msecs_to_jiffies(1000);
...
                        while (musb_readb(musb->mregs, MUSB_DEVCTL) & 
0x80) {

                                cpu_relax();

                                if (time_after(jiffies, timeout)) {
                                        dev_err(musb->controller,
                                        "configured as A device timeout");
                                        ret = -EINVAL;
                                        break;
                                }
                        }
...

When the system is getting into that routine, it's a superfluous event. 
 E.g. there wasn't actually anything that should have triggered the 
interrupt (nothing is plugged into the USB port).  If the timeout were 
functional, it would have eventually timed out but jiffies are not 
incrementing in the given context.  Additionally, 1 second is a _long_ 
time to wait in an interrupt routine that is not threaded.

So the question becomes to those familiar with the subsystem:  What is 
the proper fix?  Before the patch that introduced the jiffy timeout 
(594632efbb usb: musb: Adding musb support for OMAP4430 - author Hema HK 
<hemahk@xxxxxx>), it seemed okay for the routine in question to not have 
a 1 second timeout in an interrupt context.

- Tim

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html