Hi Michael,
On 26/06/2024 03:56, Michael Schmitz wrote:
Jean-Michel,
On 24/06/24 20:56, Jean-Michel Hautbois wrote:
When I printk the do_page_fault first debug, I get for the first call
to ls:
bash-5.2# ls
[ 14.700000] do page fault:
[ 14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6,
0, (ptrval)
Page not present, read fault. Please disable obfuscation of kernel
pointer addresses by printk. Maybe also disable address space
randomization while debugging this.
This call works almost fine (I still have the assert failed:
folio->private != NULL issue).
And when I call it a second time, I get:
bash-5.2# ls
[ 19.820000] do page fault:
[ 19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004,
2, (ptrval)
Page not present, write fault.
It would be helpful if you could get a dump of /proc/1/maps before the
execve() syscall in your helloworld init replacement. That might confirm
all these addresses are legit (assuming mappings survive across
execve(), that is), and what they correspond to.
The address corresponds to the defined zone ELF_ET_DYN_BASE as I set
it to 0x70000000.
regs->pc is not the same as the address. It might be unrelevant, but
any help is appreciated to understand the process behind :-).
I keep digging, and I am in the asm part which fears me a bit !
I don't see that you'd need to look at any asm code here.
I add a small test in do_page_fault, and in case of an error, it panics.
The result follows:
./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
[ 3.857000] Run /bin/bash as init process
[ 3.858000] with arguments:
[ 3.861000] /bin/bash
[ 3.862000] with environment:
[ 3.863000] HOME=/
[ 3.864000] TERM=linux
[ 4.242000] do page fault:
[ 4.242000] regs->sr=0x2000, regs->pc=0x41366924, address=0x700b3364,
2, 41fb0000
[ 4.242000] Kernel panic - not syncing: page fault error
[ 4.242000] CPU: 0 PID: 1 Comm: bash Not tainted
6.10.0-rc5-g927da6cf01fe-dirty #25
[ 4.242000] Stack from 4186dda8:
[ 4.242000] 4186dda8 41423aa4 41423aa4 700b3300 00000001
00000000 4136ee10 41423aa4
[ 4.242000] 41366d7a 700b3364 700b3364 00000000 0000000d
4186de60 41fb0000 41d51a60
[ 4.242000] 41005696 41416a90 41416a4d 00002000 41366924
700b3364 00000002 41fb0000
[ 4.242000] 0000000a 700b3364 00000000 0000000d 00000012
41d51a00 4186de60 41d51a60
[ 4.242000] 41fb81c0 41d51a60 410052fe 4100529a 4186de60
700b3364 00000002 00000000
[ 4.242000] 700bc414 00000003 00008000 700ac000 41003660
4186de60 00000000 00000000
[ 4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
[ 4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
[ 4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
[ 4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
[ 4.242000] buserr_c (arch/m68k/kernel/traps.c:725
arch/m68k/kernel/traps.c:775)
[ 4.242000] buserr_c (arch/m68k/kernel/traps.c:748
arch/m68k/kernel/traps.c:775)
[ 4.242000] buserr (arch/m68k/kernel/entry.S:116)
[ 4.242000] ma_slots (lib/maple_tree.c:759)
[ 4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
[ 4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1)
fs/binfmt_elf.c:421 (discriminator 1))
[ 4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
[ 4.242000] memset (arch/m68k/lib/memset.c:11)
[ 4.242000] load_misc_binary (fs/binfmt_misc.c:97
fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
[ 4.242000] memset (arch/m68k/lib/memset.c:11)
[ 4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 fs/exec.c:1891
fs/exec.c:1867)
[ 4.242000] copy_strings_kernel (fs/exec.c:669)
[ 4.242000] count_strings_kernel (fs/exec.c:473)
[ 4.242000] kernel_execve (fs/exec.c:2058)
[ 4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
[ 4.242000] run_init_process (init/main.c:1389)
[ 4.242000] _printk (kernel/printk/printk.c:2365)
[ 4.242000] kernel_init (init/main.c:1508)
[ 4.242000] kernel_init (init/main.c:1459)
[ 4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
[ 4.242000]
[ 4.242000] ---[ end Kernel panic - not syncing: page fault error ]---
Looks like a memory mapping failure, but why ?
My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM
starts at 0x40000000 and ends at 0x50000000 (256MB).
It looks like a TLB write miss which is obscure to me :-).
I tried to use the /proc but as expected it is not alive after mounting it.
Thanks,
JM
Cheers,
Michael
Thanks !
JM