Re: [beagleboard] EHCI softirq kernel panic

Joel A Fernandes <agnel.joel@xxxxxxxxx> · Thu, 11 Aug 2011 00:12:12 -0500

On Wed, Aug 10, 2011 at 4:35 PM, Felipe Balbi <balbi@xxxxxx> wrote:
> Hi,
>
> On Wed, Aug 10, 2011 at 10:11:48AM -0400, Alan Stern wrote:
>> On Wed, 10 Aug 2011, Felipe Balbi wrote:
>>
>> > Hi,
>> >
>> > On Tue, Aug 09, 2011 at 02:30:14PM -0400, Jason Kridner wrote:
>> > > On Tue, Aug 9, 2011 at 1:51 PM, Joel A Fernandes <agnel.joel@xxxxxxxxx> wrote:
>> > > > Anyone seen this before?
>> > >
>> > > A lot of the kernel developers don't frequent the beagleboard list.
>> > > If you think it is a general kernel bug, I suspect you want to copy
>> > > linux-omap.
>> >
>> > and linux-usb, and Alan Stern as he's the EHCI maintainer and myself for
>> > the OMAP USB part ;-)
>> >
>> > > > Trying to boot 3.0.0 with OE patches from an SD Card, and with a
>> > > > network cable connected results in the following traceback.
>> > > > Not connecting a network cable makes the errors go away.
>>
>> > > > [ ? 99.084899] Unable to handle kernel NULL pointer dereference at
>> > > > virtual address 00000000
>> > > > [ ? 99.093383] pgd = c0004000
>> > > > [ ? 99.096191] [00000000] *pgd=00000000
>> > > > [ ? 99.099945] Internal error: Oops: 17 [#2]
>> > > > [ ? 99.104125] Modules linked in: ipv6
>> > > > [ ? 99.107788] CPU: 0 ? ?Tainted: G ? ? ?D ? ? ?(3.0.0+ #1)
>> > > > [ ? 99.113342] PC is at ehci_quiesce+0xc/0x94
>> > > > [ ? 99.117614] LR is at ehci_stop+0x34/0x8c
>> > > > [ ? 99.121734] pc : [<c0325ce4>] ? ?lr : [<c0328bfc>] ? ?psr: 200001d3
>> > > > [ ? 99.121734] sp : c064de70 ?ip : 00000108 ?fp : c06b8624
>> > > > [ ? 99.133728] r10: c064dec0 ?r9 : 00000000 ?r8 : dee08504
>> > > > [ ? 99.139190] r7 : c0328b94 ?r6 : 00000100 ?r5 : dee08504 ?r4 : dee08608
>> > > > [ ? 99.145996] r3 : 00000000 ?r2 : dee086ec ?r1 : dee086b8 ?r0 : dee08608
>> > > > [ ? 99.152832] Flags: nzCv ?IRQs off ?FIQs off ?Mode SVC_32 ?ISA ARM
>> > > > Segment kernel
>> > > > [ ? 99.160644] Control: 10c5387d ?Table: 9d804019 ?DAC: 00000015
>> > > > [ ? 99.166656] Process swapper (pid: 0, stack limit = 0xc064c2f0)
>> > > > [ ? 99.172760] Stack: (0xc064de70 to 0xc064e000)
>>
>> > > > [ ? 99.288482] [<c0325ce4>] (ehci_quiesce+0xc/0x94) from [<c0328bfc>]
>> > > > (ehci_stop+0x34/0x8c)
>> > > > [ ? 99.296936] [<c0328bfc>] (ehci_stop+0x34/0x8c) from [<c007a3d4>]
>> > > > (run_timer_softirq+0x15c/0x1f8)
>> > > > [ ? 99.306121] [<c007a3d4>] (run_timer_softirq+0x15c/0x1f8) from
>> > > > [<c064dec0>] (0xc064dec0)
>> > > > [ ? 99.314483] Code: c05d7f9a e92d4073 e1a04000 e5903004 (e5933000)
>> > > > [ ? 99.320892] ---[ end trace 4ae88755f08e391f ]---
>> > > > [ ? 99.325714] Kernel panic - not syncing: Fatal exception in interrupt
>>
>> I'm puzzled.  Why is ehci_stop getting called in a softirq context?
>> That should never happen.  It should get called only when the driver is
>> unbound from the controller.
>
> Maybe some OpenEmbedded patch which changed the behavior and ended up
> breaking the driver ?
>

Hi Felipe,

Thanks for looking into this.

It could be the 1GHz OPP patch in OE, after reverting it [1] I haven't
seen this issue since.

Thanks,
Joel

[1] https://github.com/joelagnel/meta-texasinstruments/commit/95fc251b1aeafc1ef774659a8e8654e11b620778
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html