RE: Debugging early SError exception

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Dirk,
Cheers,
Lior.

> -----Original Message-----
> From: Dirk Behme <dirk.behme@xxxxxxxxx>
> Sent: Tuesday, December 19, 2023 9:09 AM
> To: Lior Weintraub <liorw@xxxxxxxxxx>; linux-embedded@xxxxxxxxxxxxxxx
> Subject: Re: Debugging early SError exception
> 
> [You don't often get email from dirk.behme@xxxxxxxxx. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> CAUTION: External Sender
> 
> Am 17.12.23 um 22:32 schrieb Lior Weintraub:
> > Hi,
> >
> > We have a new SoC with eLinux porting (kernel v6.5).
> > This SoC is ARM64 (A53) single core based device.
> > It runs correctly on QEMU but fails with SError on emulation platform
> (Synopsys Zebu running our SoC model).
> > There is no debugger connected to this emulation but there are several
> debug capabilities we can use:
> > 1. Generating wave dump of CPU signals
> > 2. Generate a Tarmac log
> > 3. UART
> >
> > Since the SError happens at early stages of Linux boot the UART is not
> enabled yet.
> >  From the Tarmac log we can see:
> >   3824884521 ps  ES  (ffff800080760888:d65f03c0) O el1h_ns:   ret
> (parse_early_param)
> >   3824884522 ps  ES  (ffff800080763a60:d2801800) O el1h_ns:   mov     x0,
> #0xc0   //      #192    (setup_arch)
> >                      R X0 (AARCH64) 00000000 000000c0
> >   3824884523 ps  ES  (ffff800080763a64:d51b4220) O el1h_ns:   msr
> daif,   x0      (setup_arch)
> >                      R CPSR 600000c5
> >   3824884529 ps  ES  System Error (Abort)
> >                      EXC [0x380] SError/vSError Current EL with SP_ELx
> >                      R ESR_EL1 (AARCH64) bf000002
> >                      R CPSR 600003c5
> >                      R SPSR_EL1 (AARCH64) 600000c5
> >                      R ELR_EL1 (AARCH64) ffff8000 80763a68
> >   3824884925 ps  ES  (ffff800080010b80:d10543ff) O el1h_ns:   sub     sp,
> sp,     #0x150  (vectors)
> >                      R SP_EL1 (AARCH64) ffff8000 808f3c50
> >   3824884925 ps  ES  (ffff800080010b84:8b2063ff) O el1h_ns:   add     sp,
> sp,     x0      (vectors)
> >                      R SP_EL1 (AARCH64) ffff8000 808f3d10
> >   3824884926 ps  ES  (ffff800080010b88:cb2063e0) O el1h_ns:   sub     x0,
> sp,     x0      (vectors)
> >                      R X0 (AARCH64) ffff8000 808f3c50
> >   3824884927 ps  ES  (ffff800080010b8c:37700080) O el1h_ns:   tbnz    w0,
> #14,    ffff800080010b9c        <vectors+0x39c>         (vectors)
> >   3824884935 ps  ES  (ffff800080010b90:cb2063e0) O el1h_ns:   sub     x0,
> sp,     x0      (vectors)
> >                      R X0 (AARCH64) 00000000 000000c0
> >   3824884937 ps  ES  (ffff800080010b94:cb2063ff) O el1h_ns:   sub     sp,
> sp,     x0      (vectors)
> >                      R SP_EL1 (AARCH64) ffff8000 808f3c50
> >   3824884938 ps  ES  (ffff800080010b98:140001ef) O el1h_ns:   b
> ffff800080011354        <el1h_64_error>         (vectors)
> >
> > If I understand correctly, the exception happened sometime earlier and only
> now Linux boot code (setup_arch) opened the exception handling and as a
> result we immediately jump to the SError exception handler.
> 
> 
> Yes, that sounds reasonable. If I understood correctly, you are
> running something "quite new" on some software (QEMU) and hardware
> (Synopsis) simulators.
> 
> That would mean that you have new hardware with e.g. new memory map
> not used before. What you describe might sound like in the code before
> Linux (boot loader) there is anything resulting in the SError. This
> might be an access to non-existing or non-enabled hardware. I.e. it
> might be that you try to access (read/write) an address what is not
> available, yet (or just invalid). It's hard to debug that. In case you
> are able to modify the code before Linux (the boot loader?) you might
> try to enable SError exceptions, there, too. To get it earlier and
> with that make the search window smaller. I'm not that familiar with
> QEMU, but could you try to trace which (all?) hardware accesses your
> code does. And with that analyse all accesses and with that check if
> all these accesses are valid even on the hardware (Synopsis) emulation
> system? That should be checked from valid address and from hardware
> subsystem enablement point of view.
> 
> Hth,
> 
> Dirk
> 
> 
> >  From the Linux source:
> >       parse_early_param();
> >
> >       dynamic_scs_init();
> >
> >       /*
> >        * Unmask asynchronous aborts and fiq after bringing up possible
> >        * earlycon. (Report possible System Errors once we can report this
> >        * occurred).
> >        */
> >       local_daif_restore(DAIF_PROCCTX_NOIRQ); <---- This is when we get the
> exception.
> >
> > After some kernel hacking (replacing printk) we could extract the logs:
> > 6Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> > 5Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu-
> gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU ld (GNU
> Binutils) 2.38) #101 SMP Sun Dec 17 20:09:06 IST 2023
> > 6Machine model: Pliops Spider MK-I EVK
> > 2SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
> > CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
> > Hardware name: Pliops Spider MK-I EVK (DT)
> > pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : setup_arch+0x13c/0x5ac
> > lr : setup_arch+0x134/0x5ac
> > sp : ffff8000808f3da0
> > x29: ffff8000808f3da0c x28: 0000000008758074c x27:
> 0000000005e31b58c
> > x26: 0000000000000001c x25: 0000000007e5f728c x24:
> ffff8000808f8000c
> > x23: ffff8000808f8600c x22: ffff8000807b6000c x21: ffff800080010000c
> > x20: ffff800080a1e000c x19: fffffbfffddfe190c x18: 000000002266684ac
> > x17: 00000000fcad60bbc x16: 0000000000001800c x15:
> 0000000000000008c
> > x14: ffffffffffffffffc x13: 0000000000000000c x12: 0000000000000003c
> > x11: 0101010101010101c x10: ffffffffffee87dfc x9 : 0000000000000038c
> > x8 : 0101010101010101c x7 : 7f7f7f7f7f7f7f7fc x6 : 0000000000000001c
> > x5 : 0000000000000000c x4 : 8000000000000000c x3 :
> 0000000000000065c
> > x2 : 0000000000000000c x1 : 0000000000000000c x0 :
> 00000000000000c0c
> > 0Kernel panic - not syncing: Asynchronous SError Interrupt
> > CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
> > Hardware name: Pliops Spider MK-I EVK (DT)
> > Call trace:
> >   dump_backtrace+0x9c/0xd0
> >   show_stack+0x14/0x1c
> >   dump_stack_lvl+0x44/0x58
> >   dump_stack+0x14/0x1c
> >   panic+0x2e0/0x33c
> >   nmi_panic+0x68/0x6c
> >   arm64_serror_panic+0x68/0x78
> >   do_serror+0x24/0x54
> >   el1h_64_error_handler+0x2c/0x40
> >   el1h_64_error+0x64/0x68
> >   setup_arch+0x13c/0x5ac
> >   start_kernel+0x5c/0x5b8
> >   __primary_switched+0xb4/0xbc
> > 0---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> >
> > Can you please advice how to proceed with debugging?
> >
> > Thanks in advanced,
> > Cheers,
> > Lior.
> >
> >
> 





[Index of Archives]     [Gstreamer Embedded]     [Linux MMC Devel]     [U-Boot V2]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux ARM Kernel]     [Linux OMAP]     [Linux SCSI]

  Powered by Linux