Hi, The GICv3 issue was resolved after: 1. Setting bit 0 and bit 3 on ICC_SRE_EL3 (we don't have virtualization support and hence ICC_SRE_EL2 is not supported). 2. Power up the GICR on EL3 The earlycon issue was resolved after: 1. Add to "earlycon=uart8250,mmio32,0xd000307000,115200n8" to boot args. 2. Add "CONFIG_SERIAL_8250_CONSOLE=y" to config (previously had only CONFIG_SERIAL_8250=y) Now I face a new issue: Linux boot hangs on "wait for interrupt" at cpu_do_idle. The program counter is stuck at 0xffff8000805ae45c. ffff8000805ae454 <cpu_do_idle>: ffff8000805ae454: d5033f9f dsb sy ffff8000805ae458: d503207f wfi ffff8000805ae45c: d65f03c0 ret I think that something is wrong with the timers or gic setting and as a result the scheduler doesn't get the interrupts (timer ticks). Additional info that might be relevant to this issue: The emulation platform runs at about 2.8MHz. The CNTFRQ_EL0 is set to 2M (because the emulation platform running freq varies between 1.9-2.8MHz). The reason for those settings is to allow Linux to run as it would on the "real" world. It is my understanding that there are 2 issues here: 1. Something is wrong with Timers\Interrupt setting (note that same configuration runs correctly on QEMU) 2. Something is wrong with initramfs - according kernel source it seems to fail to open "/dev/console" The full Linux boot log: Booting Linux on physical CPU 0x0000000000 [0x410fd034] Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU ld (GNU Binuti) 2.38) #112 SMP Sun Dec 24 15:44:56 IST 2023 Machine model: Pliops Spider MK-I EVK earlycon: uart8250 at MMIO32 0x000000d000307000 (options '115200n8') printk: bootconsole [uart8250] enabled efi: UEFI not found. Zone ranges: DMA [mem 0x0000000000000000-0x000000002fffffff] DMA32 empty Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000000000000-0x000000002fffffff] Initmem setup node 0 [mem 0x0000000000000000-0x000000002fffffff] percpu: Embedded 25 pages/cpu s64800 r8192 d29408 u102400 Detected VIPT I-cache on CPU0 CPU features: detected: GIC system register CPU interface CPU features: detected: ARM erratum 845719 alternatives: applying boot alternatives Kernel command line: console=ttyS0,115200n8 earlycon=uart8250,mmio32,0xd000307000,115200n8 Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear) Built 1 zonelists, mobility grouping on. Total pages: 193536 mem auto-init: stack:off, heap alloc:off, heap free:off software IO TLB: area num 1. software IO TLB: mapped [mem 0x000000002b080000-0x000000002f080000] (64MB) Memory: 689240K/786432K available (5824K kernel code, 1186K rwdata, 1612K rodata, 1600K init, 400K bss, 97192K reserved, 0K cma-reserved) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 trace event string verifier disabled rcu: Hierarchical RCU implementation. rcu: RCU event tracing is enabled. rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1. rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 GICv3: 96 SPIs implemented GICv3: 0 Extended SPIs implemented Root IRQ handler: gic_handle_irq GICv3: GICv3 features: 16 PPIs GICv3: CPU0: found redistributor 0 region 0:0x000000e000060000 ITS [mem 0xe000040000-0xe00005ffff] ITS@0x000000e000040000: allocated 8192 Devices @a0000 (indirect, esz 8, psz 64K, shr 1) ITS@0x000000e000040000: allocated 32768 Interrupt Collections @b0000 (flat, esz 2, psz 64K, shr 1) GICv3: Expected reserved range [0x00000000000c0000:0x00000000000cffff], not found GICv3: using LPI property table @0x00000000000c0000 GICv3: CPU0: Booted with LPIs enabled, memory probably corrupted CPU0: Failed to disable LPIs rcu: srcu_init: Setting srcu_struct sizes based on contention. arch_timer: cp15 timer(s) running at 62.50MHz (virt). clocksource: arch_sys_counter: mask: 0x1ffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns sched_clock: 57 bits at 63MHz, resolution 16ns, wraps every 4398046511096ns Console: colour dummy device 80x25 Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=250000) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear) Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear) cacheinfo: Unable to detect cache hierarchy for CPU 0 rcu: Hierarchical SRCU implementation. rcu: Max phase no-delay instances is 1000. Platform MSI: gic-its@E000040000 domain created PCI/MSI: /soc/interrupt-controller@E000000000/gic-its@E000040000 domain created EFI services will not be available. smp: Bringing up secondary CPUs ... smp: Brought up 1 node, 1 CPU SMP: Total of 1 processors activated. CPU features: detected: 32-bit EL0 Support CPU features: detected: CRC32 instructions CPU: All CPU(s) started at EL1 alternatives: applying system-wide alternatives devtmpfs: initialized clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns futex hash table entries: 256 (order: 2, 16384 bytes, linear) DMI not present or invalid. DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. ASID allocator initialised with 65536 entries Serial: AMBA PL011 UART driver Modules: 30080 pages in range for non-PLT usage Modules: 521600 pages in range for PLT usage iommu: Default domain type: Translated iommu: DMA domain TLB invalidation policy: strict mode SCSI subsystem initialized vgaarb: loaded clocksource: Switched to clocksource arch_sys_counter PCI: CLS 0 bytes, default 64 workingset: timestamp_bits=46 max_order=18 bucket_order=0 fuse: init (API version 7.38) Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251) io scheduler mq-deadline registered io scheduler kyber registered Unpacking initramfs... Freeing initrd memory: 4596K Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available clk: Disabling unused clocks Warning: unable to open an initial console. Freeing unused kernel memory: 1600K Thanks in advance for your great advice and support, Cheers, Lior. > -----Original Message----- > From: Heiko Schocher <hs@xxxxxxx> > Sent: Friday, December 22, 2023 10:04 AM > To: Dirk Behme <dirk.behme@xxxxxxxxx>; Lior Weintraub > <liorw@xxxxxxxxxx> > Cc: linux-embedded@xxxxxxxxxxxxxxx > Subject: Re: Debugging early SError exception > > [You don't often get email from hs@xxxxxxx. Learn why this is important at > https://aka.ms/LearnAboutSenderIdentification ] > > CAUTION: External Sender > > Hello Dirk, Lior, > > On 22.12.23 08:48, Dirk Behme wrote: > > Am 22.12.23 um 08:03 schrieb Lior Weintraub: > >> Hi, > >> > >> I managed to dump the __log_buf but for some reason the UART is still not > working. > >> Please note that UART printed all the U-BOOT traces so AFAIU, the device > tree is set correctly. > >> (Barebox is passing it's DTB into kernel). > >> > >> To enable the earlyprintk I have: > >> 1. Compiled the kernel with CONFIG_EARLY_PRINTK=y and > CONFIG_DEBUG_LL=y > >> 2. Modified the boot args to include: "console=ttyS0,115200n8 > earlycon=dw-apb-uart,0xd000307000" > >> 3. Verified that dw-apb-uart driver (8250_early.c) supports earlycon: > >> OF_EARLYCON_DECLARE(uart, "snps,dw-apb-uart", > early_serial8250_setup); > >> > >> From __log_buf dump: > >> Booting Linux on physical CPU 0x0000000000 [0x410fd034]4] > >> Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu- > gcc.br_real (Buildroot > >> 2023.02.1-95-g8391404e23) 11.3.0, GNU ld (GNU Binutils) 2.38) #107 > SMP Thu Dec 21 17:33:12 IST 202323 > >> Machine model: Pliops Spider MK-I EVKVK > >> efi: UEFI not found.d. > >> Zone ranges:s: > >> DMA [mem 0x0000000000000000-0x000000002fffffff]f] > >> DMA32 emptyty > >> Normal emptyty > >> Movable zone start for each nodede > >> Early memory node rangeses > >> node 0: [mem 0x0000000000000000-0x000000002fffffff]f] > >> Initmem setup node 0 [mem 0x0000000000000000- > 0x000000002fffffff]f] > >> percpu: Embedded 25 pages/cpu s64800 r8192 d29408 u10240000 > >> pcpu-alloc: s64800 r8192 d29408 u102400 alloc=25*4096 > >> pcpu-alloc: [0] 0 > >> Detected VIPT I-cache on CPU0U0 > >> CPU features: GIC system register CPU interface present but disabled by > higher exception levelel > >> CPU features: detected: ARM erratum 84571919 > >> alternatives: applying boot alternativeses > >> Kernel command line: console=ttyS0,115200n8 earlycon=dw-apb- > uart,0xd00030700000 > >> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)r) > >> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)r) > >> Built 1 zonelists, mobility grouping on. Total pages: 19353636 > >> mem auto-init: stack:off, heap alloc:off, heap free:offff > >> software IO TLB: area num 1.1. > >> software IO TLB: mapped [mem 0x000000002b080000- > 0x000000002f080000] (64MB)B) > >> Memory: 689240K/786432K available (5824K kernel code, 1186K rwdata, > 1612K rodata, 1600K init, 400K > >> bss, 97192K reserved, 0K cma-reserved)d) > >> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1=1 > >> trace event string verifier disableded > >> rcu: Hierarchical RCU implementation.n. > >> rcu: RCU event tracing is enabled.d. > >> rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.1. > >> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.s. > >> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1=1 > >> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 0 > >> GICv3: 96 SPIs implementeded > >> GICv3: 0 Extended SPIs implementeded > >> Root IRQ handler: gic_handle_irqrq > >> GICv3: GICv3 features: 16 PPIsIs > >> GICv3: CPU0: found redistributor 0 region 0:0x000000e00006000000 > >> GICv3: redistributor failed to wakeup..... > >> GICv3: GIC: unable to set SRE (disabled at EL2), panic aheadad > > > > I think the two messages above are the essential ones. > > +1 > > > Maybe it helps to check > > > > https://secure-web.cisco.com/1VmuNXQkE6u---G9xsJ8CPb6- > aguDK_MyJeUn43QsTaafgaifoFTAvcD4vQefYzFntmjc8L_J46du6- > DYArOlFkq__OwCChpFf-nXIyddL3MCQMsTZ9hIk_WCfDqIi1wSEmPSBClIYS0- > SAjwPiOf7sA2wLvt_5ehGaTHO61NJEWdOrfKy9pBT1_RDyQGXi7kz8XuAUpu > Whhipp- > ngljUJcxkHkmWDvpocGule5ZNEe5UZ3nGNjUnqCU8J_bXtCgNPEk4CyorLt7g4 > F5Ks85tlVEEutu8vyJXu8_TUacURkRnQgjvood6iVOn5w2TpSRn/https%3A%2 > F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Fv5.3%2Farm64%2Fbooting.html > > > > In the middle of that page in the "Call the kernel image" it has something > about GIC: > > > > -- cut -- > > If the kernel is entered at EL1: > > > > ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 > > ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. > > -- cut -- > > Also may it makes sense to check your firmware (bootloader, ATF?) ... may > there is some setting missing for your SoC/Board ? > > bye, > Heiko > > > > >> Internal error: Oops - Undefined instruction: 0000000062383019 [#1] > SMPMP > >> Modules linked in: > >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.5.0 #107 > >> Hardware name: Pliops Spider MK-I EVK (DT) > >> pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >> pc : gic_cpu_sys_reg_init+0x58/0x2e4 > >> lr : gic_cpu_sys_reg_init+0x2a4/0x2e4 > >> sp : ffff8000808f3b40 > >> x29: ffff8000808f3b40 x28: 0000000000000000 x27: > 0000000000000001 > >> x26: ffff000000016040 x25: 0000000000000000 x24: ffff800080a6b000 > >> x23: ffff8000808fc320 x22: ffff8000809cc000 x21: ffff00002fe74670 > >> x20: ffff800080a90000 x19: 0000000000000000 x18: fffffffffffe0b10 > >> x17: ffff8000809f9480 x16: fffffc0000002248 x15: ffff80008090af28 > >> x14: fffffffffffc0b0f x13: 6461656861206369 x12: 6e6170202c29324c > >> x11: 452074612064656c x10: 6261736964282045 x9 : > 6428204552532074 > >> x8 : ffff80008090af28 x7 : ffff8000808f3970 x6 : 000000000000000c > >> x5 : 000000000000002a x4 : 0000000000000000 x3 : > 0000000000000000 > >> x2 : 0000000000000000 x1 : ffff8000808fd0c0 x0 : 000000000000003c > >> Call trace: > >> gic_cpu_sys_reg_init+0x58/0x2e4 > >> gic_cpu_init.part.0+0xa8/0x114 > >> gic_init_bases+0x408/0x684 > >> gic_of_init+0x298/0x300 > >> of_irq_init+0x1c8/0x368 > >> irqchip_init+0x14/0x1c > >> init_IRQ+0x98/0xac > >> start_kernel+0x250/0x5b8 > >> __primary_switched+0xb4/0xbc > >> Code: 9260df39 d3441f33 d538cca0 36001180 (d538cc80) ) > >> ---[ end trace 0000000000000000 ]----- > >> Kernel panic - not syncing: Attempted to kill the idle task!k! > >> ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]----- > >> > >> > >> The kernel panic is related to GIC distributor (currently under debug) but > AFAIU, > >> this has nothing to do with the UART not working on early stages. > > > > > > Yes, I agree. GIC issue and UART (at least the polling mode) should be > indendent. > > > > Best regards > > > > Dirk > > > > > >> Thanks in advanced for your advice, > >> Cheers, > >> Lior. > >> > >> > >>> -----Original Message----- > >>> From: Heiko Schocher <hs@xxxxxxx> > >>> Sent: Thursday, December 21, 2023 1:37 PM > >>> To: Lior Weintraub <liorw@xxxxxxxxxx> > >>> Cc: Dirk Behme <dirk.behme@xxxxxxxxx>; linux- > embedded@xxxxxxxxxxxxxxx > >>> Subject: Re: Debugging early SError exception > >>> > >>> [You don't often get email from hs@xxxxxxx. Learn why this is important > at > >>> https://aka.ms/LearnAboutSenderIdentification ] > >>> > >>> CAUTION: External Sender > >>> > >>> Hi Lior, > >>> > >>> On 21.12.23 12:19, Dirk Behme wrote: > >>>> Am 21.12.23 um 11:04 schrieb Lior Weintraub: > >>>>> Thanks Dirk, > >>>>> > >>>>> Regarding the earlyprintk, not sure I know how to make it work. > >>>>> I have defined CONFIG_EARLY_PRINTK=y and CONFIG_DEBUG_LL=y on > my > >>> config but it doesn't seem to work. > >>>>> Do I need to pass something in the bootargs from the U-BOOT? > >>>>> Do I need to add that into my device tree? > >>>>> (Tried to set bootargs = "console=ttyS0,115200 earlyprintk"; under > "chosen" > >>> on my DT but it didn't > >>>>> work) > >>>> > >>>> Yes, what has to be enabled and what not and what has to be set how is > often > >>> confusing. I think this > >>>> is not common for all systems, so I think to be on the safe side you have > to look > >>> into the code for > >>>> you system. Or short; The code is the documentation ;) > >>>> > >>>> > >>>>> The UART I am using is "snps,dw-apb-uart". > >>>>> > >>>>> Last week, to output the early logs I have implemented this hack: > >>>>> 1. Modify printk macro to run my print_func > >>>>> 2. This print_func wrote the characters into a single global variable (u32 > >>> simul_uart;) > >>>>> 3. Get the address location of this global variable and extract all writes to > it > >>> from the Tarmac > >>>>> logs. > >>>>> > >>>>> This is a very slow and tedious process but it helped me identify the > initial > >>> SError. > >>>>> Initially I thought I can write directly into the UART FIFO register (which I > know > >>> the address) > >>>>> but this didn't work because Linux already setup the MMU so I guess I > need to > >>> know the virtual > >>>>> address of this FIFO. > >>>>> Do I need to use __phys_to_virt of some sort? > >>>> > >>>> Yes, I think so. Have a look to the existing serial driver, too. It should do > whats > >>> needed, and you > >>>> can borrow that, then. > >>> > >>> If you have access to the RAM after the crash (through a debugger or in > >>> your bootloader) and your mem is stable, find out the address of > __log_buf > >>> in System.map. Thats the buffer where printk writes into it, and so > dumping > >>> the content is what you would see in case uart works... > >>> > >>> Hope it helps! > >>> > >>> bye, > >>> Heiko > >>>> > >>>> Best regards > >>>> > >>>> Dirk > >>>> > >>>> > >>>>> Cheers, > >>>>> Lior. > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Dirk Behme <dirk.behme@xxxxxxxxx> > >>>>>> Sent: Thursday, December 21, 2023 10:30 AM > >>>>>> To: Lior Weintraub <liorw@xxxxxxxxxx>; linux- > embedded@xxxxxxxxxxxxxxx > >>>>>> Subject: Re: Debugging early SError exception > >>>>>> > >>>>>> [You don't often get email from dirk.behme@xxxxxxxxx. Learn why > this is > >>>>>> important at https://aka.ms/LearnAboutSenderIdentification ] > >>>>>> > >>>>>> CAUTION: External Sender > >>>>>> > >>>>>> Am 21.12.23 um 08:43 schrieb Lior Weintraub: > >>>>>>> Hi Dirk, > >>>>>>> > >>>>>>> We found that the issue was at the early stages of Barebox (a.k.a U- > BOOT > >>>>>> v2). > >>>>>> > >>>>>> Glad to hear that! :) > >>>>>> > >>>>>>> Our implementation of putc_ll (on debug_ll) was writing into the > UART Tx > >>>>>> FIFO without checking if the FIFO is full. > >>>>>>> Once the fifo got full it caused this SError probably because the UART > IP > >>>>>> generated an apberror signal. > >>>>>> > >>>>>> Thanks for the report! > >>>>>> > >>>>>>> Now the Linux is running and doesn't report the SError again but now > we > >>>>>> face another issue. > >>>>>>> We see that the PC is getting into a "report_bug" function. > >>>>>>> The Linux doesn't print anything to the UART (probably since it hasn't > got to > >>>>>> the point where the console is configured?). > >>>>>> > >>>>>> For cases like this using earlyprintk is usually a good option. Check > >>>>>> the Linux kernel serial console (UART) dirver of you SoC if it > >>>>>> supports it. In the end it should be "just" a function in the serial > >>>>>> console driver which outputs the console data via polling before > >>>>>> (later) the interrupt driven console part takes over. > >>>>>> > >>>>>> Best regards > >>>>>> > >>>>>> Dirk > >>>>>> > >>>>>> > >>>>>>> Since our debug means are limited it can take some time to find the > root > >>>>>> cause. > >>>>>>> > >>>>>>> I will keep you posted and update our findings. > >>>>>>> Love to hear your thoughts, > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Lior. > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Dirk Behme <dirk.behme@xxxxxxxxx> > >>>>>>>> Sent: Tuesday, December 19, 2023 3:37 PM > >>>>>>>> To: Lior Weintraub <liorw@xxxxxxxxxx>; linux- > embedded@xxxxxxxxxxxxxxx > >>>>>>>> Subject: Re: Debugging early SError exception > >>>>>>>> > >>>>>>>> [You don't often get email from dirk.behme@xxxxxxxxx. Learn why > this is > >>>>>>>> important at https://aka.ms/LearnAboutSenderIdentification ] > >>>>>>>> > >>>>>>>> CAUTION: External Sender > >>>>>>>> > >>>>>>>> Am 19.12.23 um 14:23 schrieb Lior Weintraub: > >>>>>>>>> Thanks Dirk, > >>>>>>>> > >>>>>>>> Welcome :) > >>>>>>>> > >>>>>>>> In case you find the root cause it would be nice to get some generic > >>>>>>>> description of it so that we can learn something :) > >>>>>>>> > >>>>>>>> Best regards > >>>>>>>> > >>>>>>>> Dirk > >>>>>>>> > >>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: Dirk Behme <dirk.behme@xxxxxxxxx> > >>>>>>>>>> Sent: Tuesday, December 19, 2023 9:09 AM > >>>>>>>>>> To: Lior Weintraub <liorw@xxxxxxxxxx>; linux- > >>>>>> embedded@xxxxxxxxxxxxxxx > >>>>>>>>>> Subject: Re: Debugging early SError exception > >>>>>>>>>> > >>>>>>>>>> [You don't often get email from dirk.behme@xxxxxxxxx. Learn > why this > >>>>>> is > >>>>>>>>>> important at https://aka.ms/LearnAboutSenderIdentification ] > >>>>>>>>>> > >>>>>>>>>> CAUTION: External Sender > >>>>>>>>>> > >>>>>>>>>> Am 17.12.23 um 22:32 schrieb Lior Weintraub: > >>>>>>>>>>> Hi, > >>>>>>>>>>> > >>>>>>>>>>> We have a new SoC with eLinux porting (kernel v6.5). > >>>>>>>>>>> This SoC is ARM64 (A53) single core based device. > >>>>>>>>>>> It runs correctly on QEMU but fails with SError on emulation > platform > >>>>>>>>>> (Synopsys Zebu running our SoC model). > >>>>>>>>>>> There is no debugger connected to this emulation but there are > several > >>>>>>>>>> debug capabilities we can use: > >>>>>>>>>>> 1. Generating wave dump of CPU signals > >>>>>>>>>>> 2. Generate a Tarmac log > >>>>>>>>>>> 3. UART > >>>>>>>>>>> > >>>>>>>>>>> Since the SError happens at early stages of Linux boot the UART > is not > >>>>>>>>>> enabled yet. > >>>>>>>>>>> From the Tarmac log we can see: > >>>>>>>>>>> 3824884521 ps ES (ffff800080760888:d65f03c0) O > el1h_ns: ret > >>>>>>>>>> (parse_early_param) > >>>>>>>>>>> 3824884522 ps ES (ffff800080763a60:d2801800) O > el1h_ns: mov > >>>>>>>> x0, > >>>>>>>>>> #0xc0 // #192 (setup_arch) > >>>>>>>>>>> R X0 (AARCH64) 00000000 000000c0 > >>>>>>>>>>> 3824884523 ps ES (ffff800080763a64:d51b4220) O > el1h_ns: msr > >>>>>>>>>> daif, x0 (setup_arch) > >>>>>>>>>>> R CPSR 600000c5 > >>>>>>>>>>> 3824884529 ps ES System Error (Abort) > >>>>>>>>>>> EXC [0x380] SError/vSError Current EL with SP_ELx > >>>>>>>>>>> R ESR_EL1 (AARCH64) bf000002 > >>>>>>>>>>> R CPSR 600003c5 > >>>>>>>>>>> R SPSR_EL1 (AARCH64) 600000c5 > >>>>>>>>>>> R ELR_EL1 (AARCH64) ffff8000 80763a68 > >>>>>>>>>>> 3824884925 ps ES (ffff800080010b80:d10543ff) O > el1h_ns: sub > >>>>>>>> sp, > >>>>>>>>>> sp, #0x150 (vectors) > >>>>>>>>>>> R SP_EL1 (AARCH64) ffff8000 808f3c50 > >>>>>>>>>>> 3824884925 ps ES (ffff800080010b84:8b2063ff) O > el1h_ns: add > >>>>>>>> sp, > >>>>>>>>>> sp, x0 (vectors) > >>>>>>>>>>> R SP_EL1 (AARCH64) ffff8000 808f3d10 > >>>>>>>>>>> 3824884926 ps ES (ffff800080010b88:cb2063e0) O > el1h_ns: sub > >>>>>>>> x0, > >>>>>>>>>> sp, x0 (vectors) > >>>>>>>>>>> R X0 (AARCH64) ffff8000 808f3c50 > >>>>>>>>>>> 3824884927 ps ES (ffff800080010b8c:37700080) O > el1h_ns: tbnz > >>>>>>>> w0, > >>>>>>>>>> #14, ffff800080010b9c <vectors+0x39c> (vectors) > >>>>>>>>>>> 3824884935 ps ES (ffff800080010b90:cb2063e0) O > el1h_ns: sub > >>>>>>>> x0, > >>>>>>>>>> sp, x0 (vectors) > >>>>>>>>>>> R X0 (AARCH64) 00000000 000000c0 > >>>>>>>>>>> 3824884937 ps ES (ffff800080010b94:cb2063ff) O > el1h_ns: sub > >>>>>> sp, > >>>>>>>>>> sp, x0 (vectors) > >>>>>>>>>>> R SP_EL1 (AARCH64) ffff8000 808f3c50 > >>>>>>>>>>> 3824884938 ps ES (ffff800080010b98:140001ef) O > el1h_ns: b > >>>>>>>>>> ffff800080011354 <el1h_64_error> (vectors) > >>>>>>>>>>> > >>>>>>>>>>> If I understand correctly, the exception happened sometime > earlier > >>> and > >>>>>>>> only > >>>>>>>>>> now Linux boot code (setup_arch) opened the exception handling > and as > >>>>>> a > >>>>>>>>>> result we immediately jump to the SError exception handler. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Yes, that sounds reasonable. If I understood correctly, you are > >>>>>>>>>> running something "quite new" on some software (QEMU) and > >>>>>> hardware > >>>>>>>>>> (Synopsis) simulators. > >>>>>>>>>> > >>>>>>>>>> That would mean that you have new hardware with e.g. new > memory > >>>>>> map > >>>>>>>>>> not used before. What you describe might sound like in the code > before > >>>>>>>>>> Linux (boot loader) there is anything resulting in the SError. This > >>>>>>>>>> might be an access to non-existing or non-enabled hardware. I.e. > it > >>>>>>>>>> might be that you try to access (read/write) an address what is > not > >>>>>>>>>> available, yet (or just invalid). It's hard to debug that. In case you > >>>>>>>>>> are able to modify the code before Linux (the boot loader?) you > might > >>>>>>>>>> try to enable SError exceptions, there, too. To get it earlier and > >>>>>>>>>> with that make the search window smaller. I'm not that familiar > with > >>>>>>>>>> QEMU, but could you try to trace which (all?) hardware accesses > your > >>>>>>>>>> code does. And with that analyse all accesses and with that check > if > >>>>>>>>>> all these accesses are valid even on the hardware (Synopsis) > emulation > >>>>>>>>>> system? That should be checked from valid address and from > hardware > >>>>>>>>>> subsystem enablement point of view. > >>>>>>>>>> > >>>>>>>>>> Hth, > >>>>>>>>>> > >>>>>>>>>> Dirk > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> From the Linux source: > >>>>>>>>>>> parse_early_param(); > >>>>>>>>>>> > >>>>>>>>>>> dynamic_scs_init(); > >>>>>>>>>>> > >>>>>>>>>>> /* > >>>>>>>>>>> * Unmask asynchronous aborts and fiq after bringing up > possible > >>>>>>>>>>> * earlycon. (Report possible System Errors once we can > report > >>> this > >>>>>>>>>>> * occurred). > >>>>>>>>>>> */ > >>>>>>>>>>> local_daif_restore(DAIF_PROCCTX_NOIRQ); <---- This is > when we > >>>>>> get > >>>>>>>> the > >>>>>>>>>> exception. > >>>>>>>>>>> > >>>>>>>>>>> After some kernel hacking (replacing printk) we could extract the > logs: > >>>>>>>>>>> 6Booting Linux on physical CPU 0x0000000000 [0x410fd034] > >>>>>>>>>>> 5Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot- > linux-gnu- > >>>>>>>>>> gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU > ld > >>>>>> (GNU > >>>>>>>>>> Binutils) 2.38) #101 SMP Sun Dec 17 20:09:06 IST 2023 > >>>>>>>>>>> 6Machine model: Pliops Spider MK-I EVK > >>>>>>>>>>> 2SError Interrupt on CPU0, code 0x00000000bf000002 -- SError > >>>>>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101 > >>>>>>>>>>> Hardware name: Pliops Spider MK-I EVK (DT) > >>>>>>>>>>> pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > >>>>>>>>>>> pc : setup_arch+0x13c/0x5ac > >>>>>>>>>>> lr : setup_arch+0x134/0x5ac > >>>>>>>>>>> sp : ffff8000808f3da0 > >>>>>>>>>>> x29: ffff8000808f3da0c x28: 0000000008758074c x27: > >>>>>>>>>> 0000000005e31b58c > >>>>>>>>>>> x26: 0000000000000001c x25: 0000000007e5f728c x24: > >>>>>>>>>> ffff8000808f8000c > >>>>>>>>>>> x23: ffff8000808f8600c x22: ffff8000807b6000c x21: > >>>>>>>> ffff800080010000c > >>>>>>>>>>> x20: ffff800080a1e000c x19: fffffbfffddfe190c x18: > >>>>>> 000000002266684ac > >>>>>>>>>>> x17: 00000000fcad60bbc x16: 0000000000001800c x15: > >>>>>>>>>> 0000000000000008c > >>>>>>>>>>> x14: ffffffffffffffffc x13: 0000000000000000c x12: > >>>>>> 0000000000000003c > >>>>>>>>>>> x11: 0101010101010101c x10: ffffffffffee87dfc x9 : > >>>>>>>> 0000000000000038c > >>>>>>>>>>> x8 : 0101010101010101c x7 : 7f7f7f7f7f7f7f7fc x6 : > >>>>>>>> 0000000000000001c > >>>>>>>>>>> x5 : 0000000000000000c x4 : 8000000000000000c x3 : > >>>>>>>>>> 0000000000000065c > >>>>>>>>>>> x2 : 0000000000000000c x1 : 0000000000000000c x0 : > >>>>>>>>>> 00000000000000c0c > >>>>>>>>>>> 0Kernel panic - not syncing: Asynchronous SError Interrupt > >>>>>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101 > >>>>>>>>>>> Hardware name: Pliops Spider MK-I EVK (DT) > >>>>>>>>>>> Call trace: > >>>>>>>>>>> dump_backtrace+0x9c/0xd0 > >>>>>>>>>>> show_stack+0x14/0x1c > >>>>>>>>>>> dump_stack_lvl+0x44/0x58 > >>>>>>>>>>> dump_stack+0x14/0x1c > >>>>>>>>>>> panic+0x2e0/0x33c > >>>>>>>>>>> nmi_panic+0x68/0x6c > >>>>>>>>>>> arm64_serror_panic+0x68/0x78 > >>>>>>>>>>> do_serror+0x24/0x54 > >>>>>>>>>>> el1h_64_error_handler+0x2c/0x40 > >>>>>>>>>>> el1h_64_error+0x64/0x68 > >>>>>>>>>>> setup_arch+0x13c/0x5ac > >>>>>>>>>>> start_kernel+0x5c/0x5b8 > >>>>>>>>>>> __primary_switched+0xb4/0xbc > >>>>>>>>>>> 0---[ end Kernel panic - not syncing: Asynchronous SError > Interrupt ]--- > >>>>>>>>>>> > >>>>>>>>>>> Can you please advice how to proceed with debugging? > >>>>>>>>>>> > >>>>>>>>>>> Thanks in advanced, > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Lior. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >>>> > >>> > >>> -- > >>> DENX Software Engineering GmbH, Managing Director: Erika Unter > >>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > >>> Phone: +49-8142-66989-52 Fax: +49-8142-66989-80 Email: > hs@xxxxxxx > > > > -- > DENX Software Engineering GmbH, Managing Director: Erika Unter > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: +49-8142-66989-52 Fax: +49-8142-66989-80 Email: hs@xxxxxxx