Hi, On 27.05.23 21:34, Linus Torvalds wrote:
On Sat, May 27, 2023 at 11:41 AM Frank Scheiner <frank.scheiner@xxxxxx> wrote:Ok, I put the decoded console messages on [2]. [2]: https://pastebin.com/dLYMijfSUgh. Apparently ia64 decoding isn't great. But at least it gives multiple line numbers: load_module (kernel/module/main.c:2291 kernel/module/main.c:2412 kernel/module/main.c:2868) except your kernel obviously has those test-patches, so I still don't know exactly where they are.
Erm, I see. I did recreate a vanilla v6.4-rc3 and ran that, decoded result is on [1] - not sure if it makes it a little better.
[1]: https://pastebin.com/z5XzEnhqI did also try to build and run a SP kernel to maybe get a better picture in the traces, but that seems to require FLATMEM, which seems to not work on that machine or due to the way it is configured (and yeah, it was also the wrong commit I used for it and it was patched...):
```[ 0.000000] Linux version 6.4.0-rc3-933174ae28ba72ab8de5b35cb7c98fc211235096-patch3_sp (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 Sat May 27 21:28:44 CEST 2023
[...][ 0.000000] ACPI: SSDT 0x000000003FE35BA8 00013C (v01 HP rx2620 00000006 INTL 20050309)
[ 0.000000] ACPI: Local APIC address (____ptrval____) [ 0.000000] 1 CPUs available, 1 CPUs total [...][ 0.000000] Kernel panic - not syncing: Cannot use FLATMEM with 246784MB hole
[ 0.000000] Please switch over to SPARSEMEM[ 0.000000] ---[ end Kernel panic - not syncing: Cannot use FLATMEM with 246784MB hole
[ 0.000000] Please switch over to SPARSEMEM ]--- ```
But it looks like it is in move_module(). Strange. I don't know how it gets to "__copy_user" from there... [ Looks at the ia64 code ] Oh. It turns out that it *says* __copy_user(), but the code is actually shared with the regular memcpy() function, which does GLOBAL_ENTRY(memcpy) and r28=0x7,in0 and r29=0x7,in1 mov f6=f0 mov retval=in0 br.cond.sptk .common_code ;; where that ".common_code" label is - surprise surprise - the common copy code, and so when the oops reports that the problem happened in __copy_user(), it actually is in this case just a normal memcpy. Ok, so it's probably the memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size); in move_module() that takes a fault. And looking at the registers, the destination is in r17/r18, and your dump has unable to handle kernel paging request at virtual address 1000000000000000 ... r17 : 0fffffffffffffff r18 : 1000000000000000 so it's almost certainly that 'dest' that is bad. Which I guess shouldn't surprise anybody. But that's where my knowledge of ia64 and the new module loader layout ends.
Thanks for your help and going as far as you could, that's greatly appreciated. Running that stuff is surely easier than debugging it. :-)
Cheers, Frank