Re: RED state exception (trap type 0x64) on U5 reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/21/2013 04:58 AM, Meelis Roos wrote:
Somwehere between 3.11.0 and 3.12-rc2, my U5-360 has consistently been
>hanging on reboot. Today I connected a serial cable and learned about a
>RED state exception. 3.10.0 and 3.11.0 are OK, 3.12-rc2 and later hang
>reliably. I have not yet started bisecting since this will need remote
>power cycle setup.
Another data point: the same problem happens on Sun Blade 100 with ALI
IDE. Does not happen on Fire V100 and Netra X1 that are also ALI IDE
based. The configs may be different too of course.

I did a bisect for full tree. It landed into tty commits, some of them
being untestable without a compile fix

Hi Meelis,

What tty commits required a compile fix?

but it came out clearly finally
(each bad commit was clearly bad, each good commit was tested for 3
reboots without a problem). Bisect resulted in his commit being at
fault:

8cb06c983822103da1cfe57b9901e60a00e61f67 is the first bad commit
commit 8cb06c983822103da1cfe57b9901e60a00e61f67
Author: Peter Hurley<peter@xxxxxxxxxxxxxxxxxx>
Date:   Sat Jun 15 10:21:18 2013 -0400

     n_tty: Remove alias ptrs in __receive_buf()

     The char and flag buffer local alias pointers, p and f, are
     unnecessary; remove them.

     Signed-off-by: Peter Hurley<peter@xxxxxxxxxxxxxxxxxx>
     Signed-off-by: Greg Kroah-Hartman<gregkh@xxxxxxxxxxxxxxxxxxx>

:040000 040000 ddc901fe810f43bc06a64397735b469b11e403e8 96d92e4e242c4b2ff11b25c005bccd093865b350 M  drivers

Reading the commit suggests that commit is not at fault - it seems so
unrelated. It just modifies on-stack function parameters instead of
local copies.

As you note, this is an unlikely culprit. Does a repeat bisect from
different good/bad starts give the same result?


Just reverting this patch in current master would not work, the code has
changed a lot.


Also, my matching with oops_enter was bad - the addresses differ by one
more '5' so it has nothing to do with oops_enter. And there is no
'00455c0' in System.map of these kernels so I have no idea what this TPC
corresponds to.

>
>reboot: Restarting system
>
>RED State Exception
>
>TL=0000.0000.0000.0005 TT=0000.0000.0000.0064
>    TPC=0000.0000.f000.4c80 TnPC=0000.0000.f000.4c84 TSTATE=0000.0099.1104.1402
>TL=0000.0000.0000.0004 TT=0000.0000.0000.0064
>    TPC=0000.0000.f000.4c80 TnPC=0000.0000.f000.4c84 TSTATE=0000.0099.1104.1402
>TL=0000.0000.0000.0003 TT=0000.0000.0000.0064
>    TPC=0000.0000.f000.4c80 TnPC=0000.0000.f000.4c84 TSTATE=0000.0099.1104.1402
>TL=0000.0000.0000.0002 TT=0000.0000.0000.0064
>    TPC=0000.0000.f000.0c80 TnPC=0000.0000.f000.0c84 TSTATE=0000.0099.1104.1402
>TL=0000.0000.0000.0001 TT=0000.0000.0000.0064
>    TPC=0000.0000.f004.55c0 TnPC=0000.0000.f004.55c4 TSTATE=0000.0099.1100.1602
>
>Trap Type 0x64 seems to fast_instruction_access_MMU_miss. It keeps
>trapping until 5 levels deep. The first one is from f00455c0 that may
>be the System.map entry
>
>00000000004555c0 T oops_enter
>
>meaning we get late oops but the MMU setup has already been torn down?
>Is this a sensible way to decode this RED data (matching TPC against
>System.map)?
>
>Is full bisect recommended or does arch/sparc bisect look more
>promising?

Is any of the above exception information useful in diagnosing this?

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Development]     [DCCP]     [Linux ARM Development]     [Linux]     [Photo]     [Yosemite Help]     [Linux ARM Kernel]     [Linux SCSI]     [Linux x86_64]     [Linux Hams]

  Powered by Linux