Am 21.12.23 um 11:04 schrieb Lior Weintraub:
Thanks Dirk,
Regarding the earlyprintk, not sure I know how to make it work.
I have defined CONFIG_EARLY_PRINTK=y and CONFIG_DEBUG_LL=y on my config but it doesn't seem to work.
Do I need to pass something in the bootargs from the U-BOOT?
Do I need to add that into my device tree?
(Tried to set bootargs = "console=ttyS0,115200 earlyprintk"; under "chosen" on my DT but it didn't work)
Yes, what has to be enabled and what not and what has to be set how is
often confusing. I think this is not common for all systems, so I
think to be on the safe side you have to look into the code for you
system. Or short; The code is the documentation ;)
The UART I am using is "snps,dw-apb-uart".
Last week, to output the early logs I have implemented this hack:
1. Modify printk macro to run my print_func
2. This print_func wrote the characters into a single global variable (u32 simul_uart;)
3. Get the address location of this global variable and extract all writes to it from the Tarmac logs.
This is a very slow and tedious process but it helped me identify the initial SError.
Initially I thought I can write directly into the UART FIFO register (which I know the address) but this didn't work because Linux already setup the MMU so I guess I need to know the virtual address of this FIFO.
Do I need to use __phys_to_virt of some sort?
Yes, I think so. Have a look to the existing serial driver, too. It
should do whats needed, and you can borrow that, then.
Best regards
Dirk
Cheers,
Lior.
-----Original Message-----
From: Dirk Behme <dirk.behme@xxxxxxxxx>
Sent: Thursday, December 21, 2023 10:30 AM
To: Lior Weintraub <liorw@xxxxxxxxxx>; linux-embedded@xxxxxxxxxxxxxxx
Subject: Re: Debugging early SError exception
[You don't often get email from dirk.behme@xxxxxxxxx. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
CAUTION: External Sender
Am 21.12.23 um 08:43 schrieb Lior Weintraub:
Hi Dirk,
We found that the issue was at the early stages of Barebox (a.k.a U-BOOT
v2).
Glad to hear that! :)
Our implementation of putc_ll (on debug_ll) was writing into the UART Tx
FIFO without checking if the FIFO is full.
Once the fifo got full it caused this SError probably because the UART IP
generated an apberror signal.
Thanks for the report!
Now the Linux is running and doesn't report the SError again but now we
face another issue.
We see that the PC is getting into a "report_bug" function.
The Linux doesn't print anything to the UART (probably since it hasn't got to
the point where the console is configured?).
For cases like this using earlyprintk is usually a good option. Check
the Linux kernel serial console (UART) dirver of you SoC if it
supports it. In the end it should be "just" a function in the serial
console driver which outputs the console data via polling before
(later) the interrupt driven console part takes over.
Best regards
Dirk
Since our debug means are limited it can take some time to find the root
cause.
I will keep you posted and update our findings.
Love to hear your thoughts,
Cheers,
Lior.
-----Original Message-----
From: Dirk Behme <dirk.behme@xxxxxxxxx>
Sent: Tuesday, December 19, 2023 3:37 PM
To: Lior Weintraub <liorw@xxxxxxxxxx>; linux-embedded@xxxxxxxxxxxxxxx
Subject: Re: Debugging early SError exception
[You don't often get email from dirk.behme@xxxxxxxxx. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
CAUTION: External Sender
Am 19.12.23 um 14:23 schrieb Lior Weintraub:
Thanks Dirk,
Welcome :)
In case you find the root cause it would be nice to get some generic
description of it so that we can learn something :)
Best regards
Dirk
-----Original Message-----
From: Dirk Behme <dirk.behme@xxxxxxxxx>
Sent: Tuesday, December 19, 2023 9:09 AM
To: Lior Weintraub <liorw@xxxxxxxxxx>; linux-
embedded@xxxxxxxxxxxxxxx
Subject: Re: Debugging early SError exception
[You don't often get email from dirk.behme@xxxxxxxxx. Learn why this
is
important at https://aka.ms/LearnAboutSenderIdentification ]
CAUTION: External Sender
Am 17.12.23 um 22:32 schrieb Lior Weintraub:
Hi,
We have a new SoC with eLinux porting (kernel v6.5).
This SoC is ARM64 (A53) single core based device.
It runs correctly on QEMU but fails with SError on emulation platform
(Synopsys Zebu running our SoC model).
There is no debugger connected to this emulation but there are several
debug capabilities we can use:
1. Generating wave dump of CPU signals
2. Generate a Tarmac log
3. UART
Since the SError happens at early stages of Linux boot the UART is not
enabled yet.
From the Tarmac log we can see:
3824884521 ps ES (ffff800080760888:d65f03c0) O el1h_ns: ret
(parse_early_param)
3824884522 ps ES (ffff800080763a60:d2801800) O el1h_ns: mov
x0,
#0xc0 // #192 (setup_arch)
R X0 (AARCH64) 00000000 000000c0
3824884523 ps ES (ffff800080763a64:d51b4220) O el1h_ns: msr
daif, x0 (setup_arch)
R CPSR 600000c5
3824884529 ps ES System Error (Abort)
EXC [0x380] SError/vSError Current EL with SP_ELx
R ESR_EL1 (AARCH64) bf000002
R CPSR 600003c5
R SPSR_EL1 (AARCH64) 600000c5
R ELR_EL1 (AARCH64) ffff8000 80763a68
3824884925 ps ES (ffff800080010b80:d10543ff) O el1h_ns: sub
sp,
sp, #0x150 (vectors)
R SP_EL1 (AARCH64) ffff8000 808f3c50
3824884925 ps ES (ffff800080010b84:8b2063ff) O el1h_ns: add
sp,
sp, x0 (vectors)
R SP_EL1 (AARCH64) ffff8000 808f3d10
3824884926 ps ES (ffff800080010b88:cb2063e0) O el1h_ns: sub
x0,
sp, x0 (vectors)
R X0 (AARCH64) ffff8000 808f3c50
3824884927 ps ES (ffff800080010b8c:37700080) O el1h_ns: tbnz
w0,
#14, ffff800080010b9c <vectors+0x39c> (vectors)
3824884935 ps ES (ffff800080010b90:cb2063e0) O el1h_ns: sub
x0,
sp, x0 (vectors)
R X0 (AARCH64) 00000000 000000c0
3824884937 ps ES (ffff800080010b94:cb2063ff) O el1h_ns: sub
sp,
sp, x0 (vectors)
R SP_EL1 (AARCH64) ffff8000 808f3c50
3824884938 ps ES (ffff800080010b98:140001ef) O el1h_ns: b
ffff800080011354 <el1h_64_error> (vectors)
If I understand correctly, the exception happened sometime earlier and
only
now Linux boot code (setup_arch) opened the exception handling and as
a
result we immediately jump to the SError exception handler.
Yes, that sounds reasonable. If I understood correctly, you are
running something "quite new" on some software (QEMU) and
hardware
(Synopsis) simulators.
That would mean that you have new hardware with e.g. new memory
map
not used before. What you describe might sound like in the code before
Linux (boot loader) there is anything resulting in the SError. This
might be an access to non-existing or non-enabled hardware. I.e. it
might be that you try to access (read/write) an address what is not
available, yet (or just invalid). It's hard to debug that. In case you
are able to modify the code before Linux (the boot loader?) you might
try to enable SError exceptions, there, too. To get it earlier and
with that make the search window smaller. I'm not that familiar with
QEMU, but could you try to trace which (all?) hardware accesses your
code does. And with that analyse all accesses and with that check if
all these accesses are valid even on the hardware (Synopsis) emulation
system? That should be checked from valid address and from hardware
subsystem enablement point of view.
Hth,
Dirk
From the Linux source:
parse_early_param();
dynamic_scs_init();
/*
* Unmask asynchronous aborts and fiq after bringing up possible
* earlycon. (Report possible System Errors once we can report this
* occurred).
*/
local_daif_restore(DAIF_PROCCTX_NOIRQ); <---- This is when we
get
the
exception.
After some kernel hacking (replacing printk) we could extract the logs:
6Booting Linux on physical CPU 0x0000000000 [0x410fd034]
5Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu-
gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU ld
(GNU
Binutils) 2.38) #101 SMP Sun Dec 17 20:09:06 IST 2023
6Machine model: Pliops Spider MK-I EVK
2SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
Hardware name: Pliops Spider MK-I EVK (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : setup_arch+0x13c/0x5ac
lr : setup_arch+0x134/0x5ac
sp : ffff8000808f3da0
x29: ffff8000808f3da0c x28: 0000000008758074c x27:
0000000005e31b58c
x26: 0000000000000001c x25: 0000000007e5f728c x24:
ffff8000808f8000c
x23: ffff8000808f8600c x22: ffff8000807b6000c x21:
ffff800080010000c
x20: ffff800080a1e000c x19: fffffbfffddfe190c x18:
000000002266684ac
x17: 00000000fcad60bbc x16: 0000000000001800c x15:
0000000000000008c
x14: ffffffffffffffffc x13: 0000000000000000c x12:
0000000000000003c
x11: 0101010101010101c x10: ffffffffffee87dfc x9 :
0000000000000038c
x8 : 0101010101010101c x7 : 7f7f7f7f7f7f7f7fc x6 :
0000000000000001c
x5 : 0000000000000000c x4 : 8000000000000000c x3 :
0000000000000065c
x2 : 0000000000000000c x1 : 0000000000000000c x0 :
00000000000000c0c
0Kernel panic - not syncing: Asynchronous SError Interrupt
CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
Hardware name: Pliops Spider MK-I EVK (DT)
Call trace:
dump_backtrace+0x9c/0xd0
show_stack+0x14/0x1c
dump_stack_lvl+0x44/0x58
dump_stack+0x14/0x1c
panic+0x2e0/0x33c
nmi_panic+0x68/0x6c
arm64_serror_panic+0x68/0x78
do_serror+0x24/0x54
el1h_64_error_handler+0x2c/0x40
el1h_64_error+0x64/0x68
setup_arch+0x13c/0x5ac
start_kernel+0x5c/0x5b8
__primary_switched+0xb4/0xbc
0---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
Can you please advice how to proceed with debugging?
Thanks in advanced,
Cheers,
Lior.