RE: Debugging early SError exception

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

The GICv3 issue was resolved after:
1. Setting bit 0 and bit 3 on ICC_SRE_EL3 (we don't have virtualization support and hence ICC_SRE_EL2 is not supported).
2. Power up the GICR on EL3

The earlycon issue was resolved after:
1. Add to "earlycon=uart8250,mmio32,0xd000307000,115200n8" to boot args.
2. Add "CONFIG_SERIAL_8250_CONSOLE=y" to config (previously had only CONFIG_SERIAL_8250=y)

Now I face a new issue:
Linux boot hangs on "wait for interrupt" at cpu_do_idle.

The program counter is stuck at 0xffff8000805ae45c.
ffff8000805ae454 <cpu_do_idle>:
ffff8000805ae454:       d5033f9f        dsb     sy
ffff8000805ae458:       d503207f        wfi
ffff8000805ae45c:       d65f03c0        ret

I think that something is wrong with the timers or gic setting and as a result the scheduler doesn't get the interrupts (timer ticks).

Additional info that might be relevant to this issue:
The emulation platform runs at about 2.8MHz.
The CNTFRQ_EL0 is set to 2M (because the emulation platform running freq varies between 1.9-2.8MHz).
The reason for those settings is to allow Linux to run as it would on the "real" world.

It is my understanding that there are 2 issues here:
1. Something is wrong with Timers\Interrupt setting (note that same configuration runs correctly on QEMU)
2. Something is wrong with initramfs - according kernel source it seems to fail to open "/dev/console"

The full Linux boot log:
Booting Linux on physical CPU 0x0000000000 [0x410fd034]
Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu-gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU ld (GNU Binuti) 2.38) #112 SMP Sun Dec 24 15:44:56 IST 2023
Machine model: Pliops Spider MK-I EVK
earlycon: uart8250 at MMIO32 0x000000d000307000 (options '115200n8')
printk: bootconsole [uart8250] enabled
efi: UEFI not found.
Zone ranges:
  DMA      [mem 0x0000000000000000-0x000000002fffffff]
  DMA32    empty
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000000000-0x000000002fffffff]
Initmem setup node 0 [mem 0x0000000000000000-0x000000002fffffff]
percpu: Embedded 25 pages/cpu s64800 r8192 d29408 u102400
Detected VIPT I-cache on CPU0
CPU features: detected: GIC system register CPU interface
CPU features: detected: ARM erratum 845719
alternatives: applying boot alternatives
Kernel command line: console=ttyS0,115200n8 earlycon=uart8250,mmio32,0xd000307000,115200n8
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
Built 1 zonelists, mobility grouping on.  Total pages: 193536
mem auto-init: stack:off, heap alloc:off, heap free:off
software IO TLB: area num 1.
software IO TLB: mapped [mem 0x000000002b080000-0x000000002f080000] (64MB)
Memory: 689240K/786432K available (5824K kernel code, 1186K rwdata, 1612K rodata, 1600K init, 400K bss, 97192K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
trace event string verifier disabled
rcu: Hierarchical RCU implementation.
rcu:    RCU event tracing is enabled.
rcu:    RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
GICv3: 96 SPIs implemented
GICv3: 0 Extended SPIs implemented
Root IRQ handler: gic_handle_irq
GICv3: GICv3 features: 16 PPIs
GICv3: CPU0: found redistributor 0 region 0:0x000000e000060000
ITS [mem 0xe000040000-0xe00005ffff]
ITS@0x000000e000040000: allocated 8192 Devices @a0000 (indirect, esz 8, psz 64K, shr 1)
ITS@0x000000e000040000: allocated 32768 Interrupt Collections @b0000 (flat, esz 2, psz 64K, shr 1)
GICv3: Expected reserved range [0x00000000000c0000:0x00000000000cffff], not found
GICv3: using LPI property table @0x00000000000c0000
GICv3: CPU0: Booted with LPIs enabled, memory probably corrupted
CPU0: Failed to disable LPIs
rcu: srcu_init: Setting srcu_struct sizes based on contention.
arch_timer: cp15 timer(s) running at 62.50MHz (virt).
clocksource: arch_sys_counter: mask: 0x1ffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
sched_clock: 57 bits at 63MHz, resolution 16ns, wraps every 4398046511096ns
Console: colour dummy device 80x25
Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=250000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
cacheinfo: Unable to detect cache hierarchy for CPU 0
rcu: Hierarchical SRCU implementation.
rcu:    Max phase no-delay instances is 1000.
Platform MSI: gic-its@E000040000 domain created
PCI/MSI: /soc/interrupt-controller@E000000000/gic-its@E000040000 domain created
EFI services will not be available.
smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 1 CPU
SMP: Total of 1 processors activated.
CPU features: detected: 32-bit EL0 Support
CPU features: detected: CRC32 instructions
CPU: All CPU(s) started at EL1
alternatives: applying system-wide alternatives
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 256 (order: 2, 16384 bytes, linear)
DMI not present or invalid.
DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
ASID allocator initialised with 65536 entries
Serial: AMBA PL011 UART driver
Modules: 30080 pages in range for non-PLT usage
Modules: 521600 pages in range for PLT usage
iommu: Default domain type: Translated
iommu: DMA domain TLB invalidation policy: strict mode
SCSI subsystem initialized
vgaarb: loaded
clocksource: Switched to clocksource arch_sys_counter
PCI: CLS 0 bytes, default 64
workingset: timestamp_bits=46 max_order=18 bucket_order=0
fuse: init (API version 7.38)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
io scheduler mq-deadline registered
io scheduler kyber registered
Unpacking initramfs...
Freeing initrd memory: 4596K
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available
clk: Disabling unused clocks
Warning: unable to open an initial console.
Freeing unused kernel memory: 1600K

Thanks in advance for your great advice and support,
Cheers,
Lior.

> -----Original Message-----
> From: Heiko Schocher <hs@xxxxxxx>
> Sent: Friday, December 22, 2023 10:04 AM
> To: Dirk Behme <dirk.behme@xxxxxxxxx>; Lior Weintraub
> <liorw@xxxxxxxxxx>
> Cc: linux-embedded@xxxxxxxxxxxxxxx
> Subject: Re: Debugging early SError exception
> 
> [You don't often get email from hs@xxxxxxx. Learn why this is important at
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> CAUTION: External Sender
> 
> Hello Dirk, Lior,
> 
> On 22.12.23 08:48, Dirk Behme wrote:
> > Am 22.12.23 um 08:03 schrieb Lior Weintraub:
> >> Hi,
> >>
> >> I managed to dump the __log_buf but for some reason the UART is still not
> working.
> >> Please note that UART printed all the U-BOOT traces so AFAIU, the device
> tree is set correctly.
> >> (Barebox is passing it's DTB into kernel).
> >>
> >> To enable the earlyprintk I have:
> >> 1. Compiled the kernel with CONFIG_EARLY_PRINTK=y and
> CONFIG_DEBUG_LL=y
> >> 2. Modified the boot args to include: "console=ttyS0,115200n8
> earlycon=dw-apb-uart,0xd000307000"
> >> 3. Verified that dw-apb-uart driver (8250_early.c) supports earlycon:
> >> OF_EARLYCON_DECLARE(uart, "snps,dw-apb-uart",
> early_serial8250_setup);
> >>
> >>  From __log_buf dump:
> >> Booting Linux on physical CPU 0x0000000000 [0x410fd034]4]
> >> Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu-
> gcc.br_real (Buildroot
> >> 2023.02.1-95-g8391404e23) 11.3.0, GNU ld (GNU Binutils) 2.38) #107
> SMP Thu Dec 21 17:33:12 IST 202323
> >> Machine model: Pliops Spider MK-I EVKVK
> >> efi: UEFI not found.d.
> >> Zone ranges:s:
> >>    DMA      [mem 0x0000000000000000-0x000000002fffffff]f]
> >>    DMA32    emptyty
> >>    Normal   emptyty
> >> Movable zone start for each nodede
> >> Early memory node rangeses
> >>    node   0: [mem 0x0000000000000000-0x000000002fffffff]f]
> >> Initmem setup node 0 [mem 0x0000000000000000-
> 0x000000002fffffff]f]
> >> percpu: Embedded 25 pages/cpu s64800 r8192 d29408 u10240000
> >> pcpu-alloc: s64800 r8192 d29408 u102400 alloc=25*4096
> >> pcpu-alloc: [0] 0
> >> Detected VIPT I-cache on CPU0U0
> >> CPU features: GIC system register CPU interface present but disabled by
> higher exception levelel
> >> CPU features: detected: ARM erratum 84571919
> >> alternatives: applying boot alternativeses
> >> Kernel command line: console=ttyS0,115200n8 earlycon=dw-apb-
> uart,0xd00030700000
> >> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)r)
> >> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)r)
> >> Built 1 zonelists, mobility grouping on.  Total pages: 19353636
> >> mem auto-init: stack:off, heap alloc:off, heap free:offff
> >> software IO TLB: area num 1.1.
> >> software IO TLB: mapped [mem 0x000000002b080000-
> 0x000000002f080000] (64MB)B)
> >> Memory: 689240K/786432K available (5824K kernel code, 1186K rwdata,
> 1612K rodata, 1600K init, 400K
> >> bss, 97192K reserved, 0K cma-reserved)d)
> >> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1=1
> >> trace event string verifier disableded
> >> rcu: Hierarchical RCU implementation.n.
> >> rcu:     RCU event tracing is enabled.d.
> >> rcu:     RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.1.
> >> rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.s.
> >> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1=1
> >> NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 0
> >> GICv3: 96 SPIs implementeded
> >> GICv3: 0 Extended SPIs implementeded
> >> Root IRQ handler: gic_handle_irqrq
> >> GICv3: GICv3 features: 16 PPIsIs
> >> GICv3: CPU0: found redistributor 0 region 0:0x000000e00006000000
> >> GICv3: redistributor failed to wakeup.....
> >> GICv3: GIC: unable to set SRE (disabled at EL2), panic aheadad
> >
> > I think the two messages above are the essential ones.
> 
> +1
> 
> > Maybe it helps to check
> >
> > https://secure-web.cisco.com/1VmuNXQkE6u---G9xsJ8CPb6-
> aguDK_MyJeUn43QsTaafgaifoFTAvcD4vQefYzFntmjc8L_J46du6-
> DYArOlFkq__OwCChpFf-nXIyddL3MCQMsTZ9hIk_WCfDqIi1wSEmPSBClIYS0-
> SAjwPiOf7sA2wLvt_5ehGaTHO61NJEWdOrfKy9pBT1_RDyQGXi7kz8XuAUpu
> Whhipp-
> ngljUJcxkHkmWDvpocGule5ZNEe5UZ3nGNjUnqCU8J_bXtCgNPEk4CyorLt7g4
> F5Ks85tlVEEutu8vyJXu8_TUacURkRnQgjvood6iVOn5w2TpSRn/https%3A%2
> F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Fv5.3%2Farm64%2Fbooting.html
> >
> > In the middle of that page in the "Call the kernel image" it has something
> about GIC:
> >
> > -- cut --
> > If the kernel is entered at EL1:
> >
> >         ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
> >         ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.
> > -- cut --
> 
> Also may it makes sense to check your firmware (bootloader, ATF?) ... may
> there is some setting missing for your SoC/Board ?
> 
> bye,
> Heiko
> 
> >
> >> Internal error: Oops - Undefined instruction: 0000000062383019 [#1]
> SMPMP
> >> Modules linked in:
> >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.5.0 #107
> >> Hardware name: Pliops Spider MK-I EVK (DT)
> >> pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> pc : gic_cpu_sys_reg_init+0x58/0x2e4
> >> lr : gic_cpu_sys_reg_init+0x2a4/0x2e4
> >> sp : ffff8000808f3b40
> >> x29: ffff8000808f3b40 x28: 0000000000000000 x27:
> 0000000000000001
> >> x26: ffff000000016040 x25: 0000000000000000 x24: ffff800080a6b000
> >> x23: ffff8000808fc320 x22: ffff8000809cc000 x21: ffff00002fe74670
> >> x20: ffff800080a90000 x19: 0000000000000000 x18: fffffffffffe0b10
> >> x17: ffff8000809f9480 x16: fffffc0000002248 x15: ffff80008090af28
> >> x14: fffffffffffc0b0f x13: 6461656861206369 x12: 6e6170202c29324c
> >> x11: 452074612064656c x10: 6261736964282045 x9 :
> 6428204552532074
> >> x8 : ffff80008090af28 x7 : ffff8000808f3970 x6 : 000000000000000c
> >> x5 : 000000000000002a x4 : 0000000000000000 x3 :
> 0000000000000000
> >> x2 : 0000000000000000 x1 : ffff8000808fd0c0 x0 : 000000000000003c
> >> Call trace:
> >>   gic_cpu_sys_reg_init+0x58/0x2e4
> >>   gic_cpu_init.part.0+0xa8/0x114
> >>   gic_init_bases+0x408/0x684
> >>   gic_of_init+0x298/0x300
> >>   of_irq_init+0x1c8/0x368
> >>   irqchip_init+0x14/0x1c
> >>   init_IRQ+0x98/0xac
> >>   start_kernel+0x250/0x5b8
> >>   __primary_switched+0xb4/0xbc
> >> Code: 9260df39 d3441f33 d538cca0 36001180 (d538cc80) )
> >> ---[ end trace 0000000000000000 ]-----
> >> Kernel panic - not syncing: Attempted to kill the idle task!k!
> >> ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]-----
> >>
> >>
> >> The kernel panic is related to GIC distributor (currently under debug) but
> AFAIU,
> >> this has nothing to do with the UART not working on early stages.
> >
> >
> > Yes, I agree. GIC issue and UART (at least the polling mode) should be
> indendent.
> >
> > Best regards
> >
> > Dirk
> >
> >
> >> Thanks in advanced for your advice,
> >> Cheers,
> >> Lior.
> >>
> >>
> >>> -----Original Message-----
> >>> From: Heiko Schocher <hs@xxxxxxx>
> >>> Sent: Thursday, December 21, 2023 1:37 PM
> >>> To: Lior Weintraub <liorw@xxxxxxxxxx>
> >>> Cc: Dirk Behme <dirk.behme@xxxxxxxxx>; linux-
> embedded@xxxxxxxxxxxxxxx
> >>> Subject: Re: Debugging early SError exception
> >>>
> >>> [You don't often get email from hs@xxxxxxx. Learn why this is important
> at
> >>> https://aka.ms/LearnAboutSenderIdentification ]
> >>>
> >>> CAUTION: External Sender
> >>>
> >>> Hi Lior,
> >>>
> >>> On 21.12.23 12:19, Dirk Behme wrote:
> >>>> Am 21.12.23 um 11:04 schrieb Lior Weintraub:
> >>>>> Thanks Dirk,
> >>>>>
> >>>>> Regarding the earlyprintk, not sure I know how to make it work.
> >>>>> I have defined CONFIG_EARLY_PRINTK=y and CONFIG_DEBUG_LL=y on
> my
> >>> config but it doesn't seem to work.
> >>>>> Do I need to pass something in the bootargs from the U-BOOT?
> >>>>> Do I need to add that into my device tree?
> >>>>> (Tried to set bootargs = "console=ttyS0,115200 earlyprintk"; under
> "chosen"
> >>> on my DT but it didn't
> >>>>> work)
> >>>>
> >>>> Yes, what has to be enabled and what not and what has to be set how is
> often
> >>> confusing. I think this
> >>>> is not common for all systems, so I think to be on the safe side you have
> to look
> >>> into the code for
> >>>> you system. Or short; The code is the documentation ;)
> >>>>
> >>>>
> >>>>> The UART I am using is "snps,dw-apb-uart".
> >>>>>
> >>>>> Last week, to output the early logs I have implemented this hack:
> >>>>> 1. Modify printk macro to run my print_func
> >>>>> 2. This print_func wrote the characters into a single global variable (u32
> >>> simul_uart;)
> >>>>> 3. Get the address location of this global variable and extract all writes to
> it
> >>> from the Tarmac
> >>>>> logs.
> >>>>>
> >>>>> This is a very slow and tedious process but it helped me identify the
> initial
> >>> SError.
> >>>>> Initially I thought I can write directly into the UART FIFO register (which I
> know
> >>> the address)
> >>>>> but this didn't work because Linux already setup the MMU so I guess I
> need to
> >>> know the virtual
> >>>>> address of this FIFO.
> >>>>> Do I need to use __phys_to_virt of some sort?
> >>>>
> >>>> Yes, I think so. Have a look to the existing serial driver, too. It should do
> whats
> >>> needed, and you
> >>>> can borrow that, then.
> >>>
> >>> If you have access to the RAM after the crash (through a debugger or in
> >>> your bootloader) and your mem is stable, find out the address of
> __log_buf
> >>> in System.map. Thats the buffer where printk writes into it, and so
> dumping
> >>> the content is what you would see in case uart works...
> >>>
> >>> Hope it helps!
> >>>
> >>> bye,
> >>> Heiko
> >>>>
> >>>> Best regards
> >>>>
> >>>> Dirk
> >>>>
> >>>>
> >>>>> Cheers,
> >>>>> Lior.
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Dirk Behme <dirk.behme@xxxxxxxxx>
> >>>>>> Sent: Thursday, December 21, 2023 10:30 AM
> >>>>>> To: Lior Weintraub <liorw@xxxxxxxxxx>; linux-
> embedded@xxxxxxxxxxxxxxx
> >>>>>> Subject: Re: Debugging early SError exception
> >>>>>>
> >>>>>> [You don't often get email from dirk.behme@xxxxxxxxx. Learn why
> this is
> >>>>>> important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>>
> >>>>>> CAUTION: External Sender
> >>>>>>
> >>>>>> Am 21.12.23 um 08:43 schrieb Lior Weintraub:
> >>>>>>> Hi Dirk,
> >>>>>>>
> >>>>>>> We found that the issue was at the early stages of Barebox (a.k.a U-
> BOOT
> >>>>>> v2).
> >>>>>>
> >>>>>> Glad to hear that! :)
> >>>>>>
> >>>>>>> Our implementation of putc_ll (on debug_ll) was writing into the
> UART Tx
> >>>>>> FIFO without checking if the FIFO is full.
> >>>>>>> Once the fifo got full it caused this SError probably because the UART
> IP
> >>>>>> generated an apberror signal.
> >>>>>>
> >>>>>> Thanks for the report!
> >>>>>>
> >>>>>>> Now the Linux is running and doesn't report the SError again but now
> we
> >>>>>> face another issue.
> >>>>>>> We see that the PC is getting into a "report_bug" function.
> >>>>>>> The Linux doesn't print anything to the UART (probably since it hasn't
> got to
> >>>>>> the point where the console is configured?).
> >>>>>>
> >>>>>> For cases like this using earlyprintk is usually a good option. Check
> >>>>>> the Linux kernel serial console (UART) dirver of you SoC if it
> >>>>>> supports it. In the end it should be "just" a function in the serial
> >>>>>> console driver which outputs the console data via polling before
> >>>>>> (later) the interrupt driven console part takes over.
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>> Dirk
> >>>>>>
> >>>>>>
> >>>>>>> Since our debug means are limited it can take some time to find the
> root
> >>>>>> cause.
> >>>>>>>
> >>>>>>> I will keep you posted and update our findings.
> >>>>>>> Love to hear your thoughts,
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Lior.
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Dirk Behme <dirk.behme@xxxxxxxxx>
> >>>>>>>> Sent: Tuesday, December 19, 2023 3:37 PM
> >>>>>>>> To: Lior Weintraub <liorw@xxxxxxxxxx>; linux-
> embedded@xxxxxxxxxxxxxxx
> >>>>>>>> Subject: Re: Debugging early SError exception
> >>>>>>>>
> >>>>>>>> [You don't often get email from dirk.behme@xxxxxxxxx. Learn why
> this is
> >>>>>>>> important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>>>>
> >>>>>>>> CAUTION: External Sender
> >>>>>>>>
> >>>>>>>> Am 19.12.23 um 14:23 schrieb Lior Weintraub:
> >>>>>>>>> Thanks Dirk,
> >>>>>>>>
> >>>>>>>> Welcome :)
> >>>>>>>>
> >>>>>>>> In case you find the root cause it would be nice to get some generic
> >>>>>>>> description of it so that we can learn something :)
> >>>>>>>>
> >>>>>>>> Best regards
> >>>>>>>>
> >>>>>>>> Dirk
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Dirk Behme <dirk.behme@xxxxxxxxx>
> >>>>>>>>>> Sent: Tuesday, December 19, 2023 9:09 AM
> >>>>>>>>>> To: Lior Weintraub <liorw@xxxxxxxxxx>; linux-
> >>>>>> embedded@xxxxxxxxxxxxxxx
> >>>>>>>>>> Subject: Re: Debugging early SError exception
> >>>>>>>>>>
> >>>>>>>>>> [You don't often get email from dirk.behme@xxxxxxxxx. Learn
> why this
> >>>>>> is
> >>>>>>>>>> important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>>>>>>>
> >>>>>>>>>> CAUTION: External Sender
> >>>>>>>>>>
> >>>>>>>>>> Am 17.12.23 um 22:32 schrieb Lior Weintraub:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> We have a new SoC with eLinux porting (kernel v6.5).
> >>>>>>>>>>> This SoC is ARM64 (A53) single core based device.
> >>>>>>>>>>> It runs correctly on QEMU but fails with SError on emulation
> platform
> >>>>>>>>>> (Synopsys Zebu running our SoC model).
> >>>>>>>>>>> There is no debugger connected to this emulation but there are
> several
> >>>>>>>>>> debug capabilities we can use:
> >>>>>>>>>>> 1. Generating wave dump of CPU signals
> >>>>>>>>>>> 2. Generate a Tarmac log
> >>>>>>>>>>> 3. UART
> >>>>>>>>>>>
> >>>>>>>>>>> Since the SError happens at early stages of Linux boot the UART
> is not
> >>>>>>>>>> enabled yet.
> >>>>>>>>>>>      From the Tarmac log we can see:
> >>>>>>>>>>>       3824884521 ps  ES  (ffff800080760888:d65f03c0) O
> el1h_ns:   ret
> >>>>>>>>>> (parse_early_param)
> >>>>>>>>>>>       3824884522 ps  ES  (ffff800080763a60:d2801800) O
> el1h_ns:   mov
> >>>>>>>> x0,
> >>>>>>>>>> #0xc0   //      #192    (setup_arch)
> >>>>>>>>>>>                          R X0 (AARCH64) 00000000 000000c0
> >>>>>>>>>>>       3824884523 ps  ES  (ffff800080763a64:d51b4220) O
> el1h_ns:   msr
> >>>>>>>>>> daif,   x0      (setup_arch)
> >>>>>>>>>>>                          R CPSR 600000c5
> >>>>>>>>>>>       3824884529 ps  ES  System Error (Abort)
> >>>>>>>>>>>                          EXC [0x380] SError/vSError Current EL with SP_ELx
> >>>>>>>>>>>                          R ESR_EL1 (AARCH64) bf000002
> >>>>>>>>>>>                          R CPSR 600003c5
> >>>>>>>>>>>                          R SPSR_EL1 (AARCH64) 600000c5
> >>>>>>>>>>>                          R ELR_EL1 (AARCH64) ffff8000 80763a68
> >>>>>>>>>>>       3824884925 ps  ES  (ffff800080010b80:d10543ff) O
> el1h_ns:   sub
> >>>>>>>> sp,
> >>>>>>>>>> sp,     #0x150  (vectors)
> >>>>>>>>>>>                          R SP_EL1 (AARCH64) ffff8000 808f3c50
> >>>>>>>>>>>       3824884925 ps  ES  (ffff800080010b84:8b2063ff) O
> el1h_ns:   add
> >>>>>>>> sp,
> >>>>>>>>>> sp,     x0      (vectors)
> >>>>>>>>>>>                          R SP_EL1 (AARCH64) ffff8000 808f3d10
> >>>>>>>>>>>       3824884926 ps  ES  (ffff800080010b88:cb2063e0) O
> el1h_ns:   sub
> >>>>>>>> x0,
> >>>>>>>>>> sp,     x0      (vectors)
> >>>>>>>>>>>                          R X0 (AARCH64) ffff8000 808f3c50
> >>>>>>>>>>>       3824884927 ps  ES  (ffff800080010b8c:37700080) O
> el1h_ns:   tbnz
> >>>>>>>> w0,
> >>>>>>>>>> #14,    ffff800080010b9c        <vectors+0x39c>         (vectors)
> >>>>>>>>>>>       3824884935 ps  ES  (ffff800080010b90:cb2063e0) O
> el1h_ns:   sub
> >>>>>>>> x0,
> >>>>>>>>>> sp,     x0      (vectors)
> >>>>>>>>>>>                          R X0 (AARCH64) 00000000 000000c0
> >>>>>>>>>>>       3824884937 ps  ES  (ffff800080010b94:cb2063ff) O
> el1h_ns:   sub
> >>>>>> sp,
> >>>>>>>>>> sp,     x0      (vectors)
> >>>>>>>>>>>                          R SP_EL1 (AARCH64) ffff8000 808f3c50
> >>>>>>>>>>>       3824884938 ps  ES  (ffff800080010b98:140001ef) O
> el1h_ns:   b
> >>>>>>>>>> ffff800080011354        <el1h_64_error>         (vectors)
> >>>>>>>>>>>
> >>>>>>>>>>> If I understand correctly, the exception happened sometime
> earlier
> >>> and
> >>>>>>>> only
> >>>>>>>>>> now Linux boot code (setup_arch) opened the exception handling
> and as
> >>>>>> a
> >>>>>>>>>> result we immediately jump to the SError exception handler.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yes, that sounds reasonable. If I understood correctly, you are
> >>>>>>>>>> running something "quite new" on some software (QEMU) and
> >>>>>> hardware
> >>>>>>>>>> (Synopsis) simulators.
> >>>>>>>>>>
> >>>>>>>>>> That would mean that you have new hardware with e.g. new
> memory
> >>>>>> map
> >>>>>>>>>> not used before. What you describe might sound like in the code
> before
> >>>>>>>>>> Linux (boot loader) there is anything resulting in the SError. This
> >>>>>>>>>> might be an access to non-existing or non-enabled hardware. I.e.
> it
> >>>>>>>>>> might be that you try to access (read/write) an address what is
> not
> >>>>>>>>>> available, yet (or just invalid). It's hard to debug that. In case you
> >>>>>>>>>> are able to modify the code before Linux (the boot loader?) you
> might
> >>>>>>>>>> try to enable SError exceptions, there, too. To get it earlier and
> >>>>>>>>>> with that make the search window smaller. I'm not that familiar
> with
> >>>>>>>>>> QEMU, but could you try to trace which (all?) hardware accesses
> your
> >>>>>>>>>> code does. And with that analyse all accesses and with that check
> if
> >>>>>>>>>> all these accesses are valid even on the hardware (Synopsis)
> emulation
> >>>>>>>>>> system? That should be checked from valid address and from
> hardware
> >>>>>>>>>> subsystem enablement point of view.
> >>>>>>>>>>
> >>>>>>>>>> Hth,
> >>>>>>>>>>
> >>>>>>>>>> Dirk
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>      From the Linux source:
> >>>>>>>>>>>           parse_early_param();
> >>>>>>>>>>>
> >>>>>>>>>>>           dynamic_scs_init();
> >>>>>>>>>>>
> >>>>>>>>>>>           /*
> >>>>>>>>>>>            * Unmask asynchronous aborts and fiq after bringing up
> possible
> >>>>>>>>>>>            * earlycon. (Report possible System Errors once we can
> report
> >>> this
> >>>>>>>>>>>            * occurred).
> >>>>>>>>>>>            */
> >>>>>>>>>>>           local_daif_restore(DAIF_PROCCTX_NOIRQ); <---- This is
> when we
> >>>>>> get
> >>>>>>>> the
> >>>>>>>>>> exception.
> >>>>>>>>>>>
> >>>>>>>>>>> After some kernel hacking (replacing printk) we could extract the
> logs:
> >>>>>>>>>>> 6Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> >>>>>>>>>>> 5Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-
> linux-gnu-
> >>>>>>>>>> gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU
> ld
> >>>>>> (GNU
> >>>>>>>>>> Binutils) 2.38) #101 SMP Sun Dec 17 20:09:06 IST 2023
> >>>>>>>>>>> 6Machine model: Pliops Spider MK-I EVK
> >>>>>>>>>>> 2SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
> >>>>>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
> >>>>>>>>>>> Hardware name: Pliops Spider MK-I EVK (DT)
> >>>>>>>>>>> pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> >>>>>>>>>>> pc : setup_arch+0x13c/0x5ac
> >>>>>>>>>>> lr : setup_arch+0x134/0x5ac
> >>>>>>>>>>> sp : ffff8000808f3da0
> >>>>>>>>>>> x29: ffff8000808f3da0c x28: 0000000008758074c x27:
> >>>>>>>>>> 0000000005e31b58c
> >>>>>>>>>>> x26: 0000000000000001c x25: 0000000007e5f728c x24:
> >>>>>>>>>> ffff8000808f8000c
> >>>>>>>>>>> x23: ffff8000808f8600c x22: ffff8000807b6000c x21:
> >>>>>>>> ffff800080010000c
> >>>>>>>>>>> x20: ffff800080a1e000c x19: fffffbfffddfe190c x18:
> >>>>>> 000000002266684ac
> >>>>>>>>>>> x17: 00000000fcad60bbc x16: 0000000000001800c x15:
> >>>>>>>>>> 0000000000000008c
> >>>>>>>>>>> x14: ffffffffffffffffc x13: 0000000000000000c x12:
> >>>>>> 0000000000000003c
> >>>>>>>>>>> x11: 0101010101010101c x10: ffffffffffee87dfc x9 :
> >>>>>>>> 0000000000000038c
> >>>>>>>>>>> x8 : 0101010101010101c x7 : 7f7f7f7f7f7f7f7fc x6 :
> >>>>>>>> 0000000000000001c
> >>>>>>>>>>> x5 : 0000000000000000c x4 : 8000000000000000c x3 :
> >>>>>>>>>> 0000000000000065c
> >>>>>>>>>>> x2 : 0000000000000000c x1 : 0000000000000000c x0 :
> >>>>>>>>>> 00000000000000c0c
> >>>>>>>>>>> 0Kernel panic - not syncing: Asynchronous SError Interrupt
> >>>>>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
> >>>>>>>>>>> Hardware name: Pliops Spider MK-I EVK (DT)
> >>>>>>>>>>> Call trace:
> >>>>>>>>>>>       dump_backtrace+0x9c/0xd0
> >>>>>>>>>>>       show_stack+0x14/0x1c
> >>>>>>>>>>>       dump_stack_lvl+0x44/0x58
> >>>>>>>>>>>       dump_stack+0x14/0x1c
> >>>>>>>>>>>       panic+0x2e0/0x33c
> >>>>>>>>>>>       nmi_panic+0x68/0x6c
> >>>>>>>>>>>       arm64_serror_panic+0x68/0x78
> >>>>>>>>>>>       do_serror+0x24/0x54
> >>>>>>>>>>>       el1h_64_error_handler+0x2c/0x40
> >>>>>>>>>>>       el1h_64_error+0x64/0x68
> >>>>>>>>>>>       setup_arch+0x13c/0x5ac
> >>>>>>>>>>>       start_kernel+0x5c/0x5b8
> >>>>>>>>>>>       __primary_switched+0xb4/0xbc
> >>>>>>>>>>> 0---[ end Kernel panic - not syncing: Asynchronous SError
> Interrupt ]---
> >>>>>>>>>>>
> >>>>>>>>>>> Can you please advice how to proceed with debugging?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advanced,
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Lior.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>> --
> >>> DENX Software Engineering GmbH,      Managing Director: Erika Unter
> >>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> >>> Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email:
> hs@xxxxxxx
> >
> 
> --
> DENX Software Engineering GmbH,      Managing Director: Erika Unter
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: hs@xxxxxxx





[Index of Archives]     [Gstreamer Embedded]     [Linux MMC Devel]     [U-Boot V2]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux ARM Kernel]     [Linux OMAP]     [Linux SCSI]

  Powered by Linux