On 11/08/2022 14:13, Conor Dooley wrote: > Hey Nathan, > > On 10/08/2022 20:43, Conor Dooley - M52691 wrote: >> On 10/08/2022 20:32, Nathan Chancellor wrote: >>> On Wed, Aug 10, 2022 at 07:20:24PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote: >>>> On 10/08/2022 19:56, Nathan Chancellor wrote: >>>>> Hi Conor, >>>>> >>>>> On Tue, Aug 09, 2022 at 11:05:32PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote: >>>>>> +CC clang people :) >>>>>> >>>>>> Got an odd one here and would appreciate some pointers for where to >>>>>> look. This code when built with gcc boots fine, for example with: >>>>>> riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0 >>>>>> The same code but build with clang build it fails to boot but prior to >>>>>> that applying this patchset it boots fine. Specifically it is the patch >>>>>> "clk: microchip: mpfs: move id & offset out of clock structs" >>>>>> >>>>>> I applied this patchset on top of tonight's master (15205c2829ca) but >>>>>> I've been seeing the same problem for a few weeks on -next too. I tried >>>>>> the following 2 versions of clang/llvm: >>>>>> ClangBuiltLinux clang version 15.0.0 (5b0788fef86ed7008a11f6ee19b9d86d42b6fcfa), LLD 15.0.0 >>>>>> ClangBuiltLinux clang version 15.0.0 (bab8af8ea062f6332b5c5d13ae688bb8900f244a), LLD 15.0.0 >>>>> >>>>> Good to know that it reproduces with fairly recent versions of LLVM :) >>>>> >>>>>> It's probably something silly that I've overlooked but I am not au >>>>>> fait with these sort of things unfortunately, but hey - at least I'll >>>>>> learn something then. >>>>> >>>>> I took a quick glance at the patch you mentioned above and I don't >>>>> immediately see anything as problematic... >>>> >>>> Yeah, I couldn't see any low hanging fruit either. >>>> >>>>> I was going to see if I could >>>>> reproduce this locally in QEMU since I do see there is a machine >>>>> 'microchip-icicle-kit' but I am not having much success getting the >>>>> machine past SBI. Does this reproduce in QEMU or are you working with >>>>> the real hardware? If QEMU, do you happen to have a working invocation >>>>> handy? >>>> >>>> Yeah... So there was a QEMU incantation that worked at some point in >>>> the past (ie when someone wrote the QEMU port) but most peripherals >>>> are not implemented and current versions of our openSBI implementation >>>> requires more than one of the unimplemented peripherals. I was trying to >>>> get it working lately in the evenings based on some patches that were a >>>> year old but no joy :/ >>> >>> Heh, I guess that would explain why it wasn't working for me :) >>> >>>> I'm running on the real hardware, I'll give the older combo of qemu >>>> "bios" etc a go again over the weekend & try to get it working. In the >>>> meantime, any suggestions? >>> >>> Are you building with 'LLVM=1' or just 'CC=clang'? If 'LLVM=1', I would >>> try breaking it apart into the individual options (LD=ld.lld, >>> OBJCOPY=llvm-objcopy) and see if dropping one of those makes a >>> difference. We have had subtle differences between the GNU and LLVM >>> tools before and it is much easier to look into that difference if we >>> know it happens in only one tool. >> >> LLVM=1. >> >>> >>> Otherwise, I am not sure I have any immediate ideas other than looking >>> at the disassembly and trying to see if something is going wrong. Is >>> the object file being modified in any other way (I don't think there is >>> something like objtool for RISC-V but I could be wrong)? >> >> I'll give the options a go so, I'll LYK how I get on. > > So I managed to wrangle QEMU into repro-ing. booting with bootloaders > etc isn't going to work (nor will the config with gcc actually boot > properly) but it gets far enough to reproduce the problem. > You've got to jump right to the kernel for which the magic incantation > is: > > $(QEMU)/qemu-system-riscv64 -M microchip-icicle-kit \ > -m 2G -smp 5 \ > -kernel $(wrkdir)/vmlinux.bin \ > -dtb $(wrkdir)/riscvpc.dtb \ > -display none -serial null \ > -serial stdio > > (serial0 is disabled in the dt) > > With gcc there'll be a bunch of warnings like: > clk_ahb: Zero divisor and CLK_DIVIDER_ALLOW_ZERO not set > That's "fine", not sure if it's the lack of bootloaders or the > emulation but 0 isn't a value the hardware will see. With the defconfig > I provided it'll fail to boot fairly late on because of missing musb > emulation. FWIW, I posted a QEMU patch to fix the missing peripherals, so a direct kernel boot works now for GCC: https://lore.kernel.org/qemu-devel/20220813135127.2971754-1-mail@xxxxxxxxxxx (btw, I am on libera as conchuod in #riscv if you ever wanna ping me about something, usually still about for "sane" NA working hours too) > > Doesn't really matter since thats long enough to get past the switch > out of earlycon which is where the clang built kernel dies. > > Didn't get a chance to look at disassembly etc today, but as I said > last night it reproduces with GNU binutils. > > Thanks, > Conor. > > On another note, brought up our QEMU port's state today so fixing > it is now on the good ole, ever expanding todo list :)