Hi Alexander, I'm genuinely trying to reproduce/understand this report, questions below: On Sat, Jan 9, 2021 at 11:15 AM Alexander Lobakin <alobakin@xxxxx> wrote: > > From: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> > Date: Sat, 9 Jan 2021 09:50:44 -0800 > > > On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <alobakin@xxxxx> wrote: > >> > >> Machine: MIPS32 R2 Big Endian (interAptiv (multi)) > >> > >> While testing MIPS with LLVM, I found a weird and very rare bug with > >> MIPS relocs that LLVM emits into kernel modules. It happens on both > >> 11.0.0 and latest git snapshot and applies, as I can see, only to > >> references to static symbols. > >> > >> When the kernel loads the module, it allocates a space for every > >> section and then manually apply the relocations relative to the > >> new address. > >> > >> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko. > >> It's static and referenced only in phy_register_driver(), where it's > >> used to fill callback pointer in a structure. > >> > >> The real function address after module loading is 0xc06c1444, that > >> is observed in its ELF st_value field. > >> There are two relocs related to this usage in phy_register_driver(): > >> > >> R_MIPS_HI16 refers to 0x3c010000 > >> R_MIPS_LO16 refers to 0x24339444 Sorry, how are these calculated? (Explicit shell commands invoked would be appreciated) I'm doing: $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 32r2_defconfig $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 modules $ llvm-nm --format=sysv drivers/net/phy/phy_device.o | grep phy_probe $ llvm-objdump -Dr --disassemble-symbols=phy_driver_register drivers/net/phy/phy_device.o $ llvm-readelf -r drivers/net/phy/phy_device.o | grep -e R_MIPS_HI16 -e R_MIPS_LO16 for some of the commands trying to verify. > >> > >> The address of .text is 0xc06b8000. So the destination is calculated > >> as follows: > >> > >> 0x00000000 from hi16; > >> 0xffff9444 from lo16 (sign extend as it's always treated as signed); > >> 0xc06b8000 from base. > >> > >> = 0xc06b1444. The value is lower than the real phy_probe() address > >> (0xc06c1444) by 0x10000 and is lower than the base address of > >> module's .text, so it's 100% incorrect. The disassembly for me produces: 399c: 3c 03 00 00 lui $3, 0 <phy_device_free> 0000399c: R_MIPS_HI16 .text ... 39a8: 24 63 3a 5c addiu $3, $3, 14940 <phy_probe> 000039a8: R_MIPS_LO16 .text I'm not really sure how to manually resolve the relocations; Fangrui do you have any tips? (I'm coincidentally reading through Linkers & Loaders currently, but only just started chpt. 4). > >> > >> This results in: > >> > >> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual > >> address c06b1444, epc == c06b1444, ra == 803f1090 > >> > >> The correct instructions should be: > >> > >> R_MIPS_HI16 0x3c010001 > >> R_MIPS_LO16 0x24339444 > >> > >> so there'll be 0x00010000 from hi16. > >> > >> I tried to catch those bugs in arch/mips/kernel/module.c (by checking > >> if the destination is lower than the base address, which should never > >> happen), and seems like I have only 3 such places in libphy.ko (and > >> one in nf_tables.ko). > >> I don't think it should be handled somehow in mentioned source code > >> as it would look rather ugly and may break kernels build with GNU > >> stack, which seems to not produce such bad codes. > >> > >> If I should report this to any other resources, please let me know. > >> I chose clang-built-linux and LKML as it may not happen with userland > >> (didn't tried to catch). > > > > Thanks for the report. Sounds like we may indeed be producing an > > incorrect relocation. This is only seen for big endian triples? > > Unfortunately I don't have a LE board to play with, so can confirm > only Big Endian. > > (BTW, if someone can say if it's possible for MIPS (and how if it is) > to launch a LE kernel from BE-booted preloader and U-Boot, that would > be super cool) > > > Getting a way for us to deterministically reproduce would be a good > > first step. Which config or configs beyond defconfig, and which > > relocations specifically are you observing this with? > > I use `make 32r2_defconfig` which combines several configs from > arch/mips/configs: > - generic_defconfig; > - generic/32r2.config; > - generic/eb.config. > > Aside from that, I enable a bunch of my WIP drivers and the > Netfilter. On my setup, this bug is always present in libphy.ko, > so CONFIG_PHYLIB=m (with all deps) should be enough. > > The three failed relocs belongs to this part of code: [0] > > llvm-readelf on them: > > Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries:¬ > [...] > 00005740 00029305 R_MIPS_HI16 00000000 .text > 00005744 00029306 R_MIPS_LO16 00000000 .text > 00005720 00029305 R_MIPS_HI16 00000000 .text > 00005748 00029306 R_MIPS_LO16 00000000 .text > 0000573c 00029305 R_MIPS_HI16 00000000 .text > 0000574c 00029306 R_MIPS_LO16 00000000 .text > > The first pair is the one from my first mail: > 0x3c010000 <-- should be 0x3c010001 to work properly > 0x24339444 > > I'm planning to hunt for more now, will let you know. > > [0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989 > > > Thanks, > > ~Nick Desaulniers > > Thanks, > Al > -- Thanks, ~Nick Desaulniers