Re: m68k 54418 fails to execute user space

Jean-Michel Hautbois <jeanmichel.hautbois@xxxxxxxxxx> · Thu, 27 Jun 2024 16:52:55 +0200

Hi Greg,

On 27/06/2024 16:46, Greg Ungerer wrote:
Hi JM,

On 27/6/24 22:36, Jean-Michel Hautbois wrote:
Michael,

On 26/06/2024 21:36, Michael Schmitz wrote:
Jean-Michel,

On 27/06/24 01:28, Jean-Michel Hautbois wrote:
Hi Michael,

On 26/06/2024 03:56, Michael Schmitz wrote:
Jean-Michel,

On 24/06/24 20:56, Jean-Michel Hautbois wrote:

When I printk the do_page_fault first debug, I get for the first 
call to ls:
bash-5.2# ls
[   14.700000] do page fault:
[   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, 
address=0x70069ee6, 0, (ptrval)

Page not present, read fault. Please disable obfuscation of kernel 
pointer addresses by printk. Maybe also disable address space 
randomization while debugging this.

This call works almost fine (I still have the assert failed: 
folio->private != NULL issue).

And when I call it a second time, I get:
bash-5.2# ls
[   19.820000] do page fault:
[   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, 
address=0x700e2004, 2, (ptrval)

Page not present, write fault.

It would be helpful if you could get a dump of /proc/1/maps before 
the execve() syscall in your helloworld init replacement. That 
might confirm all these addresses are legit (assuming mappings 
survive across execve(), that is), and what they correspond to.


The address corresponds to the defined zone ELF_ET_DYN_BASE as I 
set it to 0x70000000.

regs->pc is not the same as the address. It might be unrelevant, 
but any help is appreciated to understand the process behind :-).

I keep digging, and I am in the asm part which fears me a bit !

I don't see that you'd need to look at any asm code here.

I add a small test in do_page_fault, and in case of an error, it 
panics. The result follows:

Please take a look at the comments at the start of 
arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in 
error_code are explained there.

error_code != 0 is just one possible case out of the four that are 
handled by do_page_fault(). It does not signify 'no error' - if there 
hadn't been a page fault, do_page_fault() would not have been called.

You just forced a panic each time a write fault and/or a protection 
fault happens. Write faults are absolutely expected to happen when 
loading a library - ld.so needs to perform relocation after loading a 
dynamic library, and that means writes to the GOT in the library's 
data segment (PIC assumed).


 ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
[    3.857000] Run /bin/bash as init process
[    3.858000]   with arguments:
[    3.861000]     /bin/bash
[    3.862000]   with environment:
[    3.863000]     HOME=/
[    3.864000]     TERM=linux
[    4.242000] do page fault:
[    4.242000] regs->sr=0x2000, regs->pc=0x41366924, 
address=0x700b3364, 2, 41fb0000
[    4.242000] Kernel panic - not syncing: page fault error
[    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 
6.10.0-rc5-g927da6cf01fe-dirty #25
[    4.242000] Stack from 4186dda8:
[    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 
00000000 4136ee10 41423aa4
[    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 
4186de60 41fb0000 41d51a60
[    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 
700b3364 00000002 41fb0000
[    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 
41d51a00 4186de60 41d51a60
[    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 
700b3364 00000002 00000000
[    4.242000]         700bc414 00000003 00008000 700ac000 41003660 
4186de60 00000000 00000000
[    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
[    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
[    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 
1))
[    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
[    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 
arch/m68k/kernel/traps.c:775)
[    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 
arch/m68k/kernel/traps.c:775)
[    4.242000] buserr (arch/m68k/kernel/entry.S:116)
[    4.242000] ma_slots (lib/maple_tree.c:759)
[    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
[    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) 
fs/binfmt_elf.c:421 (discriminator 1))
[    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
[    4.242000] memset (arch/m68k/lib/memset.c:11)
[    4.242000] load_misc_binary (fs/binfmt_misc.c:97 
fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
[    4.242000] memset (arch/m68k/lib/memset.c:11)
[    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 
fs/exec.c:1891 fs/exec.c:1867)
[    4.242000] copy_strings_kernel (fs/exec.c:669)
[    4.242000] count_strings_kernel (fs/exec.c:473)
[    4.242000] kernel_execve (fs/exec.c:2058)
[    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
[    4.242000] run_init_process (init/main.c:1389)
[    4.242000] _printk (kernel/printk/printk.c:2365)
[    4.242000] kernel_init (init/main.c:1508)
[    4.242000] kernel_init (init/main.c:1459)
[    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
[    4.242000]
[    4.242000] ---[ end Kernel panic - not syncing: page fault error 
]---

Looks like a memory mapping failure, but why ?
My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM 
starts at 0x40000000 and ends at 0x50000000 (256MB).
0x41fb0000 seems to be init's page directory. The fault address is in 
the range where I'd expect dynamic libraries to reside.

It looks like a TLB write miss which is obscure to me :-).

I tried to use the /proc but as expected it is not alive after 
mounting it.

The memory map ought to be accessible through sysrq - an alternative 
would be to modify the ELF binfmt handler and dump the map once ld.so 
has finished with relocations.

I added a dump in the binfmt_elf file:

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index a43897b03ce9..395f556f3a90 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -816,6 +816,63 @@ static int parse_elf_properties(struct file *f, 
const struct elf_phdr *phdr,
         return ret == -ENOENT ? 0 : ret;
  }

+static int dump_memory_map(struct task_struct *task)
+{
+    struct mm_struct *mm = task->mm;
+    struct vm_area_struct *vma;
+       MA_STATE(mas, &mm->mm_mt, 0, -1);
+    struct file *file;
+    struct path *path;
+    char *buf;
+    char *pathname;
+
+    // Acquire the read lock for mmap_lock
+    down_read(&mm->mmap_lock);
+       mas_lock(&mas);
+    for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas, 
ULONG_MAX)) {
+        if (vma->vm_file) {
+            buf = (char *)__get_free_page(GFP_KERNEL);
+            if (!buf) {
+                continue; // Handle memory allocation failure
+            }
+
+            file = vma->vm_file;
+            path = &file->f_path;
+            pathname = d_path(path, buf, PAGE_SIZE);
+            if (IS_ERR(pathname)) {
+                pathname = NULL;
+            }
+
+            pr_info("%lx-%lx %c%c%c%c %08lx %02x:%02x %lu %s\n",
+                vma->vm_start, vma->vm_end,
+                vma->vm_flags & VM_READ ? 'r' : '-',
+                vma->vm_flags & VM_WRITE ? 'w' : '-',
+                vma->vm_flags & VM_EXEC ? 'x' : '-',
+                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
+                vma->vm_pgoff << PAGE_SHIFT,
+                MAJOR(file->f_inode->i_rdev),
+                MINOR(file->f_inode->i_rdev),
+                file->f_inode->i_ino,
+                pathname ? pathname : "");
+
+            free_page((unsigned long)buf);
+        } else {
+            pr_info("%lx-%lx %c%c%c%c %08lx 00:00 0\n",
+                vma->vm_start, vma->vm_end,
+                vma->vm_flags & VM_READ ? 'r' : '-',
+                vma->vm_flags & VM_WRITE ? 'w' : '-',
+                vma->vm_flags & VM_EXEC ? 'x' : '-',
+                vma->vm_flags & VM_MAYSHARE ? 's' : 'p',
+                vma->vm_pgoff << PAGE_SHIFT);
+        }
+    }
+       mas_unlock(&mas);
+    // Release the read lock for mmap_lock
+    up_read(&mm->mmap_lock);
+
+    return 0;
+}
+
  static int load_elf_binary(struct linux_binprm *bprm)
  {
         struct file *interpreter = NULL; /* to shut gcc up */
@@ -1299,6 +1356,9 @@ static int load_elf_binary(struct linux_binprm 
*bprm)

         finalize_exec(bprm);
         START_THREAD(elf_ex, regs, elf_entry, bprm->p);
+       if (current->pid == 1) {  // Check if this is the init process
+            dump_memory_map(current);
+    }
         retval = 0;
  out:
         return retval;

I think it is quick and dirty, but seems to do the trick.
I then get in my console:
[    4.265000] 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1
[    4.266000] 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1
[    4.267000] 70000000-700ac000 r-xp 00000000 00:00 27 /bin/bash
[    4.268000] 700ac000-700b4000 rw-p 000ac000 00:00 27 /bin/bash
[    4.269000] 700b4000-700be000 rwxp 700b4000 00:00 0
[    4.270000] bfe7a000-bfe9c000 rw-p bffde000 00:00 0

But nothing rings a bell at this level for me...
Thanks !

Here is the same dump trace generated on my newly resurrected M5475EVB 
for comparison:

[snip]
Freeing unused kernel image (initmem) memory: 80K
This architecture does not have kernel memory protection.
Run /sbin/init as init process
Run /etc/init as init process
Run /bin/init as init process
process '/bin/init' started with executable stack

I don't have this message, I suppose it is related to uClibc vs libc ?

60000000-60008000 r-xp 00000000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so
60008000-6000c000 rw-p 00006000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so
80000000-80004000 r-xp 00000000 00:00 1882624 /bin/init
80004000-80008000 rw-p 00002000 00:00 1882624 /bin/init

You init is at 0x8000000 and not 0x7000000... Interesting. Even if I 
don't think it has a big impact...

bfc9a000-bfcbc000 rwxp bffde000 00:00 0
Welcome to
...

Execution otherwise continues as normal to a shell after this.

Regards
Greg