Re: Reserved instruction in kernel code

"Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx> · Wed, 2 Dec 2009 18:52:39 +0000 (GMT)

On Wed, 2 Dec 2009, David Daney wrote:

> > Reserved instruction in kernel code[#1]:
> > 
> > Cpu 0
> > 
> > $ 0   : 00000000 1000fc00 802be630 00000001
> > 
> > $ 4   : 802be670 802be674 ffffffff 802f4d4c
> > 
> > $ 8   : 1000fc01 1000001f 00000001 0000002b
> > 
> > $12   : 00000000 000001f5 07a0d380 00000000
> > 
> > $16   : 00000000 00000000 00000000 1000fc00
> > 
> > $20   : 802e9674 bd030f04 3e490000 00000a72
> > 
> > $24   : 00000008 8000167c                 
> > $28   : 802ba000 802bbd30 ffffffff 802f4d4c
> > 
> > Hi    : 000000fb
> > 
> > Lo    : 00000001
> > 
> > epc   : 801013a0 handle_ri_int+0x18/0x38
> > 
> >     Not tainted
> > 
> > ra    : 802f4d4c __log_buf+0x0/0x20000
> > 
> > Status: 1000fc03    KERNEL EXL IE
> > 
> > Cause : 50808000
> > 
> > PrId  : 00019365 (MIPS 24Kc)
> > 
> > Modules linked in:
> > 
> > Process swapper (pid: 0, threadinfo=802ba000, task=802bc000, tls=00000000)
> > 
> > Stack : 1000fc00 1000001f 00000001 0000002b 00000000 000001f5 00000000
> > 1000fc00
> > 
> >         802be630 00000001 802be670 802be674 ffffffff 802f4d4c 1000fc00
> > 1000001f
> > 
> >         00000001 0000002b 00000000 000001f5 07a0d380 00000000 00000000
> > 00000000
> > 
> >         00000000 1000fc00 802e9674 bd030f04 3e490000 00000a72 00000008
> > 8000167c
> > 
> >         802bbe94 802a2954 802ba000 802bbde0 ffffffff 802f4d4c 1000fc02
> > 000000fb
> > 
> >         ...
> > 
> > Call Trace:
> > 
> > [<801013a0>] handle_ri_int+0x18/0x38
> > 
> >  
> >  
> > Code: 01094025  3908001e  40886000 <00000040> 00000040  00000040  
> 
>       ...   ssnop ssnop ssnop ...
> 
> One would think a 'PrId  : 00019365 (MIPS 24Kc)' would execute those.
> 
> The cause value indicates an 'Interrupt' but you are somehow executing in
> handle_ri_int, so it could be that multiple exceptions are messing up the OOPS
> output...

 Look at the preceding code -- the EXL bit has just been cleared (while 
executing handle_ri_int()) and the interrupt exception has been 
immediately taken, overwriting the EPC and Cause registers with what you 
can see above.  So either the original RI exception happened earlier 
elsewhere, or there is something completely broken somewhere resulting in 
this misleading dump (like stack corruption resulting in a jump to 
handle_ri() or whatever).

 To figure out which is the case I'd suggest running the RI handler with 
interrupts disabled for debugging and see if the correct values from EPC 
and Cause are reported.  If this runs correctly, then obviously the 
causing place of the RI exception has to be fixed, but also the interrupt 
exception handler has to be investigated to see why the values from EPC 
and Cause stored on the stack get corrupted.  Otherwise the new symptoms 
will (hopefully) suggest what to do next.

  Maciej