RE: [PATCH v2] Porting barebox to a new SoC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Ahmad, I Will try to post same question on Linux mailing list.

> -----Original Message-----
> From: Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx>
> Sent: Tuesday, August 22, 2023 11:01 AM
> To: Lior Weintraub <liorw@xxxxxxxxxx>
> Cc: Ahmad Fatoum <ahmad@xxxxxx>; barebox@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2] Porting barebox to a new SoC
> 
> CAUTION: External Sender
> 
> Hello Lior,
> 
> On 03.08.23 13:17, Lior Weintraub wrote:
> > Hi Ahmad,
> >
> > Hope you had a great time on EOSS 2023 :-)
> 
> Thanks and sorry for the late answer.
> 
> > Quick recap and additional info on the current issue:
> >
> > 1.
> > The spider-soc QEMU with the additional GICv3 and Timers was tested with
> a bare-metal code and proved to be OK.
> > This bare-metal code sets the A53 timers and GICv3 to handle interrupts on
> various execution levels as well as various security levels:
> > EL1_NS_PHYSICAL_TIMER set as GROUP1_NON_SECURE
> > EL1_SCR_PHYSICAL_TIMER set as GROUP1_SECURE
> > EL2_PHYSICAL_TIMER set as GROUP1_SECURE
> > VIRTUAL_TIMER set as GROUP1_NON_SECURE
> 
> ok.
> 
> > 2.
> > The kernel we build with Buildroot runs OK on virt QEMU but gets stuck in
> the middle when we use our spider-soc QEMU.
> > There are few differences between those runs:
> > a.
> > The virt QEMU is executed with -kernel switch and hence the QEMU itself
> implements the "bootloader" and prepares the DT given to the Kernel.
> > When the Kernel starts on this platforms it starts at EL1.
> 
> This can be influenced e.g. on Virt with -M virt,virtualization=on, I think.
> 
> > b.
> > The spider-soc QEMU is executed with -device loader,file=spider-soc-bl1.elf
> > Just for easy execution and testing, this executable includes all the needed
> binaries (as const data blobs) and it copies the binaries into correct locations
> before jumping to Barebox execution.
> > The list of binaries includes the barebox, kernel, dt, and rootfs.
> > As you recall, BL31 is compiled via Trusted-Firmware-A and has all it's
> functions as empty stubs because we currently don't care about CPU power
> states.
> > The prove that BL31 is executed correctly is that Barebox now runs at EL2.
> 
> Good.
> 
> > At that point the Linux kernel is starting and as I mentioned gets stuck in the
> middle (cpu_do_idle function. more details to follow).
> >
> > Debugging the kernel with GDB revealed few differences:
> > 1. When running with Barebox, the kernel starts at EL2 and at some point
> moves to EL1.
> > Not sure if that has some impact on the following issue but thought it is
> worth mentioning.
> > (We get a "CPU: All CPU(s) started at EL2" trace)
> 
> I get the same on an i.MX8M as well (multi-core Cortex-A53 SoC).
> 
> > Another difference that might be related to this execution level is that timers
> setting shows that it uses the physical timer (as oppose to virt QEMU run that
> uses the virtual timer):
> > The spider-soc QEMU Timers dump:
> > CNTFRQ_EL0 = 0x3b9aca0
> > CNTP_CTL_EL0 = 0x5
> > CNTV_CTL_EL0 = 0x0
> > CNTP_TVAL_EL0 = 0xff1f2ad5
> > CNTP_CVAL_EL0 = 0xac5c3240
> > CNTV_TVAL_EL0 = 0x52c2d916
> > CNTV_CVAL_EL0 = 0x0
> >
> > The virt QEMU Timers dump:
> > CNTFRQ_EL0 = 0x3b9aca0
> > CNTP_CTL_EL0 = 0x0
> > CNTV_CTL_EL0 = 0x5
> > CNTP_TVAL_EL0 = 0xb8394fbc
> > CNTP_CVAL_EL0 = 0x0
> > CNTV_TVAL_EL0 = 0xffd18e39
> > CNTV_CVAL_EL0 = 0x479858aa
> >
> > 2. When running with Barebox, the kernel fails to correctly set the GICv3
> registers.
> > So in other words, there are no timer events and hence the scheduler is not
> running.
> > The code get stuck on cpu_do_idle but we also found that the RCU cb_list is
> not empty (probably explains why scheduler haven't started (just a guess)).
> > We placed a breakpoint just before calling wait_for_completion (from
> function rcu_barrier on kernel/rcu/tree.c) and found:
> > bt
> > #0  rcu_barrier () at kernel/rcu/tree.c:4064
> > #1  0xffffffc08059e1b4 in mark_readonly () at init/main.c:1789
> > #2  kernel_init (unused=<optimized out>) at init/main.c:1838
> > #3  0xffffffc080015e48 in ret_from_fork () at
> arch/arm64/kernel/entry.S:853
> >
> > At that point rcu_state.barrier_cpu_count.counter is 1 (as oppose to virt
> QEMU where it is 0 at that point)
> > If we place the breakpoint a bit earlier in this rcu_barrier function (just
> before the for_each_possible_cpu loop) and run few more steps (to get the
> rdp) we see that rdp->cblist.len is 0x268 (616):
> > p/x rdp->cblist
> > $1 = {head = 0xffffffc0808f06d0, tails = {0xffffff802fe55a78,
> 0xffffff802fe55a78, 0xffffff802fe55a78, 0xffffff80001c22c8}, gp_seq = {0x0,
> 0x0, 0x0, 0x0}, len = 0x268, seglen = {0x0, 0x0, 0x0, 0x268}, flags = 0x1}
> >
> > When we compare that with virt QEMU we see that the rdp->cblist.len is 0
> there.
> >
> > IMHO, this all is a result of the GICv3 settings that were not applied properly.
> > As a result there are no timer interrupts.
> >
> > Further debugging on the GICv3 settings showed that the code (function
> gic_cpu_init on drivers/irqchip/irq-gic-v3.c) tries to write 0xffffffff to
> GICR_IGROUPR0 (Configure SGIs/PPIs as non-secure Group-1) but when we
> try to read it back we get all zeros.
> > Dumping GICv3 settings after the call to init_IRQ:
> > Showing only the differences:
> >                       Spider-SoC QEMU virt QEMU
> > GICD_CTLR =           0x00000012              0x00000053
> > GICD_TYPER =          0x037a0402              0x037a0007
> > GICR0_IGROUPR0 =      0x00000000              0xffffffff
> > GICR0_ISENABLER0 =    0x00000000              0x0000007f
> > GICR0_ICENABLER0 =    0x00000000              0x0000007f
> > GICR0_ICFGR0 =        0x00000000              0xaaaaaaaa
> >
> > Any thoughts?
> > As always, your support is much appreciated!
> 
> Sorry to disappoint, but I have no hands-on experience with the GIC.
> My guess would be that you are missing initialization in the TF-A...
> 
> Cheers,
> Ahmad
> 
> >
> > Cheers,
> > Lior.
> >
> >
> >> -----Original Message-----
> >> From: Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx>
> >> Sent: Friday, June 30, 2023 8:53 AM
> >> To: Lior Weintraub <liorw@xxxxxxxxxx>; Ahmad Fatoum <ahmad@xxxxxx>;
> >> barebox@xxxxxxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH v2] Porting barebox to a new SoC
> >>
> >> CAUTION: External Sender
> >>
> >> Hi Lior,
> >>
> >> On 25.06.23 22:33, Lior Weintraub wrote:
> >>> Hello Ahmad,
> >>
> >> [Sorry for the delay, we're at EOSS 2023 currently]
> >>
> >>> I failed to reproduce this issue on virt because the addresses and
> peripherals
> >> on virt machine are different and it is difficult to change our code to match
> >> that.
> >>> If you think this is critical I will make extra effort to make it work.
> >>> AFAIU, this suggestion was made to debug the "conflict" issue.
> >>
> >> It's not critical, but I'd have liked to understand this, so I can check
> >> if it's perhaps a barebox bug.
> >>
> >>> Currently the workaround I am using is just to set the size of the kernel
> >> partition to match the exact size of the "Image" file.
> >>>
> >>> The other issue I am facing is that Kernel seems stuck on cpu_do_idle and
> >> there is no login prompt from the kernel.
> >>
> >> Does it call into PSCI during idle?
> >>
> >>> As you recall, I am running on a custom QEMU that tries to emulate our
> >> platform.
> >>> I suspect that I did something wrong with the GICv3 and Timers
> connectivity.
> >>> The code I used was based on examples I saw on sbsa-ref.c and virt.c.
> >>> In addition, I declared the GICv3 and timers on our device tree.
> >>>
> >>> I running QEMU with "-d int" so I am also getting trace of exceptions and
> >> interrupts.
> >>
> >> Nice. Didn't know about this option.
> >>
> >> [snip]
> >>
> >>> Exception return from AArch64 EL3 to AArch64 EL1 PC
> 0xffffffc00802112c
> >>> Taking exception 13 [Secure Monitor Call] on CPU 0
> >>> ...from EL1 to EL3
> >>> ...with ESR 0x17/0x5e000000
> >>> ...with ELR 0xffffffc008021640
> >>> ...to EL3 PC 0x10005400 PSTATE 0x3cd
> >>> Exception return from AArch64 EL3 to AArch64 EL1 PC
> 0xffffffc008021640
> >>
> >> Looks fine so far? Doesn't look like it's hanging in EL1.
> >>
> >> [snip]
> >>
> >>> Segment Routing with IPv6
> >>> In-situ OAM (IOAM) with IPv6
> >>> sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> >>> NET: Registered PF_PACKET protocol family
> >>> NET: Registered PF_KEY protocol family
> >>> NET: Registered PF_VSOCK protocol family
> >>> registered taskstats version 1
> >>> clk: Disabling unused clocks
> >>> Freeing unused kernel memory: 1664K
> >>
> >> Not sure. Normally, I'd try again with pd_ignore_unused clk_ignore_unused
> in
> >> the
> >> kernel arguments, but I think you define no clocks or power domains yet in
> >> the DT?
> >>
> >> You can try again with kernel command line option initcall_debug and see
> >> what the
> >> initcall is that is getting stuck. If nothing helps, maybe attach a hardware
> >> debugger?
> >>
> >> Cheers,
> >> Ahmad
> >>
> >> --
> >> Pengutronix e.K.                           |                             |
> >> Steuerwalder Str. 21                       | http://secure-
> web.cisco.com/1RKlXzLFuAdOeswlWvHRCbVHHvoQssFo7iVFqyvv8Yn0sP-
> MsWtfVRf2HW_4AXhQNuR5kNBuKLHNWkQfzg5qQhZ2AYhdNYqrfmNM7Isb
> bDhybYe7C21TIR6Du5pxC7TSTbhhg4oBK3J9y2XyMtJNhBKeliNv2I5G4mlnB_
> 57ph9x9tlPHstmZ8SL22VzM9RxLoj-5LddbVSsB69VGG-
> O3Hw57EyoSFWKWmjNjOHDmuU1R3SwOX2tkDMmiLPauqbBc-
> FP9cAFpclCgrOIJu2Jfef0-
> sVV346BmbxC1SOFAKCI/http%3A%2F%2Fwww.pengutronix.de%2F  |
> >> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> >> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> >
> 
> --
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | http://secure-
> web.cisco.com/1RKlXzLFuAdOeswlWvHRCbVHHvoQssFo7iVFqyvv8Yn0sP-
> MsWtfVRf2HW_4AXhQNuR5kNBuKLHNWkQfzg5qQhZ2AYhdNYqrfmNM7Isb
> bDhybYe7C21TIR6Du5pxC7TSTbhhg4oBK3J9y2XyMtJNhBKeliNv2I5G4mlnB_
> 57ph9x9tlPHstmZ8SL22VzM9RxLoj-5LddbVSsB69VGG-
> O3Hw57EyoSFWKWmjNjOHDmuU1R3SwOX2tkDMmiLPauqbBc-
> FP9cAFpclCgrOIJu2Jfef0-
> sVV346BmbxC1SOFAKCI/http%3A%2F%2Fwww.pengutronix.de%2F  |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> 





[Index of Archives]     [Linux Embedded]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux