From: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> Date: Mon, 11 Jan 2021 12:03:29 -0800 > Hi Alexander, > I'm genuinely trying to reproduce/understand this report, questions below: Hi Nick! > On Sat, Jan 9, 2021 at 11:15 AM Alexander Lobakin <alobakin@xxxxx> wrote: >> >> From: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> >> Date: Sat, 9 Jan 2021 09:50:44 -0800 >> >>> On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <alobakin@xxxxx> wrote: >>>> >>>> Machine: MIPS32 R2 Big Endian (interAptiv (multi)) >>>> >>>> While testing MIPS with LLVM, I found a weird and very rare bug with >>>> MIPS relocs that LLVM emits into kernel modules. It happens on both >>>> 11.0.0 and latest git snapshot and applies, as I can see, only to >>>> references to static symbols. >>>> >>>> When the kernel loads the module, it allocates a space for every >>>> section and then manually apply the relocations relative to the >>>> new address. >>>> >>>> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko. >>>> It's static and referenced only in phy_register_driver(), where it's >>>> used to fill callback pointer in a structure. >>>> >>>> The real function address after module loading is 0xc06c1444, that >>>> is observed in its ELF st_value field. >>>> There are two relocs related to this usage in phy_register_driver(): >>>> >>>> R_MIPS_HI16 refers to 0x3c010000 >>>> R_MIPS_LO16 refers to 0x24339444 > > Sorry, how are these calculated? (Explicit shell commands invoked > would be appreciated) Just look at the disassembly below and you'll see the similar instructions in your module. > I'm doing: > $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 32r2_defconfig > $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 modules > $ llvm-nm --format=sysv drivers/net/phy/phy_device.o | grep phy_probe > $ llvm-objdump -Dr --disassemble-symbols=phy_driver_register drivers/net/phy/phy_device.o > $ llvm-readelf -r drivers/net/phy/phy_device.o | grep -e R_MIPS_HI16 -e R_MIPS_LO16 > > for some of the commands trying to verify. > >>>> >>>> The address of .text is 0xc06b8000. So the destination is calculated >>>> as follows: >>>> >>>> 0x00000000 from hi16; >>>> 0xffff9444 from lo16 (sign extend as it's always treated as signed); >>>> 0xc06b8000 from base. >>>> >>>> = 0xc06b1444. The value is lower than the real phy_probe() address >>>> (0xc06c1444) by 0x10000 and is lower than the base address of >>>> module's .text, so it's 100% incorrect. > > The disassembly for me produces: > 399c: 3c 03 00 00 lui $3, 0 <phy_device_free> > 0000399c: R_MIPS_HI16 .text > ... > 39a8: 24 63 3a 5c addiu $3, $3, 14940 <phy_probe> > 000039a8: R_MIPS_LO16 .text So, in your case the values of the instructions that relocs refer are: 0x3c030000 R_MIPS_HI16 0x24633a5c R_MIPS_LO16 Mine were: 0x3c010000 0x24339444 Your second one doesn't have bit 15 set, so I think this pair won't break the code. Try to hunt for R_MIPS_LO16 that have this bit set, i.e. they have '8', '9', 'a', 'b', 'c', 'd' or 'e' as their [15:12]. > I'm not really sure how to manually resolve the relocations; Fangrui > do you have any tips? (I'm coincidentally reading through Linkers & > Loaders currently, but only just started chpt. 4). > >>>> >>>> This results in: >>>> >>>> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual >>>> address c06b1444, epc == c06b1444, ra == 803f1090 >>>> >>>> The correct instructions should be: >>>> >>>> R_MIPS_HI16 0x3c010001 >>>> R_MIPS_LO16 0x24339444 >>>> >>>> so there'll be 0x00010000 from hi16. >>>> >>>> I tried to catch those bugs in arch/mips/kernel/module.c (by checking >>>> if the destination is lower than the base address, which should never >>>> happen), and seems like I have only 3 such places in libphy.ko (and >>>> one in nf_tables.ko). >>>> I don't think it should be handled somehow in mentioned source code >>>> as it would look rather ugly and may break kernels build with GNU >>>> stack, which seems to not produce such bad codes. >>>> >>>> If I should report this to any other resources, please let me know. >>>> I chose clang-built-linux and LKML as it may not happen with userland >>>> (didn't tried to catch). >>> >>> Thanks for the report. Sounds like we may indeed be producing an >>> incorrect relocation. This is only seen for big endian triples? >> >> Unfortunately I don't have a LE board to play with, so can confirm >> only Big Endian. >> >> (BTW, if someone can say if it's possible for MIPS (and how if it is) >> to launch a LE kernel from BE-booted preloader and U-Boot, that would >> be super cool) >> >>> Getting a way for us to deterministically reproduce would be a good >>> first step. Which config or configs beyond defconfig, and which >>> relocations specifically are you observing this with? >> >> I use `make 32r2_defconfig` which combines several configs from >> arch/mips/configs: >> - generic_defconfig; >> - generic/32r2.config; >> - generic/eb.config. >> >> Aside from that, I enable a bunch of my WIP drivers and the >> Netfilter. On my setup, this bug is always present in libphy.ko, >> so CONFIG_PHYLIB=m (with all deps) should be enough. >> >> The three failed relocs belongs to this part of code: [0] >> >> llvm-readelf on them: >> >> Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries: >> [...] >> 00005740 00029305 R_MIPS_HI16 00000000 .text >> 00005744 00029306 R_MIPS_LO16 00000000 .text >> 00005720 00029305 R_MIPS_HI16 00000000 .text >> 00005748 00029306 R_MIPS_LO16 00000000 .text >> 0000573c 00029305 R_MIPS_HI16 00000000 .text >> 0000574c 00029306 R_MIPS_LO16 00000000 .text >> >> The first pair is the one from my first mail: >> 0x3c010000 <-- should be 0x3c010001 to work properly >> 0x24339444 >> >> I'm planning to hunt for more now, will let you know. >> >> [0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989 >> >> > Thanks, >> > ~Nick Desaulniers >> >> Thanks, >> Al >> > > Thanks, > ~Nick Desaulniers Thanks, Al