On Fri, Jun 18, 2021 at 6:54 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > The registers being zero at that point is actually expected, so that's > not much of a hint. But yeah, clearly I got some stack initialization > offset or something wrong there, and I don't know modern m68k nearly > well enough to even guess where I screwed up. Oh. I think I might have an idea. In that ret_from_kernel_thread code, after it returns from thge kernel thread, I did addql #4,%sp RESTORE_SWITCH_STACK jra ret_from_exception where that "RESTORE_SWITCH_STACK" loads the user space registers from the user-space switch stack. BUT. The switch stack also has that "retpc", and that one should just be popped. So I think that code should read addql #4,%sp RESTORE_SWITCH_STACK addql #4,%sp jra ret_from_exception because we want to restore the switch stack registers (they'll all be 0) but we then want to get rid of retpc too before we jump to ret_from_exception, which will do the RESTORE_ALL. (The first 'addql' is to remove the argument we've pushed on the stack earlier in ret_from_kernel_thread, the second addql is to remove retpc). All the *normal* uses of RESTORE_SWITCH_STACK will just do "rts", but that's because they were called from the shallower stack. The ret_from_kernel_thread case is kind of special, because it had that stack frame layout built up manually, rather than have a call to it. In that sense, it's a bit more like the 'do_trace_entry/exit' code, which builds that switch stack using subql #4,%sp SAVE_SWITCH_STACK and then undoes it using that same RESTORE_SWITCH_STACK addql #4,%sp sequence that I _should_ have done in ret_from_kernel_thread. (Also, see ret_from_signal) I've attached an updated patch just in case my incoherent ramblings above didn't explain what I meant. It's the same as the previous patch, it just has that one extra stack update there. Does _this_ one perhaps work? Linus
arch/m68k/kernel/entry.S | 11 +++++++++++ arch/m68k/kernel/process.c | 14 +++++++++----- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S index 9dd76fbb7c6b..88fe0e1a3793 100644 --- a/arch/m68k/kernel/entry.S +++ b/arch/m68k/kernel/entry.S @@ -119,6 +119,15 @@ ENTRY(ret_from_fork) addql #4,%sp jra ret_from_exception + | A kernel thread will jump here directly from resume, + | with the stack containing the full register state + | (pt_regs and switch_stack). + | + | The argument will be in d7, and the kernel function + | to call will be in a3. + | + | If the kernel function returns, we want to return + | to user space - it has done a kernel_execve(). ENTRY(ret_from_kernel_thread) | a3 contains the kernel thread payload, d7 - its argument movel %d1,%sp@- @@ -126,6 +135,8 @@ ENTRY(ret_from_kernel_thread) movel %d7,(%sp) jsr %a3@ addql #4,%sp + RESTORE_SWITCH_STACK + addql #4,%sp jra ret_from_exception #if defined(CONFIG_COLDFIRE) || !defined(CONFIG_MMU) diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c index da83cc83e791..0705f14871a3 100644 --- a/arch/m68k/kernel/process.c +++ b/arch/m68k/kernel/process.c @@ -158,13 +158,17 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg, p->thread.fs = get_fs().seg; if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) { - /* kernel thread */ - memset(frame, 0, sizeof(struct fork_frame)); + struct switch_stack *kstp = &frame->sw - 1; + + /* kernel thread - a kernel-side switch-stack and the full user fork_frame */ + memset(kstp, 0, sizeof(struct switch_stack) + sizeof(struct fork_frame)); + frame->regs.sr = PS_S; - frame->sw.a3 = usp; /* function */ - frame->sw.d7 = arg; - frame->sw.retpc = (unsigned long)ret_from_kernel_thread; + kstp->a3 = usp; /* function */ + kstp->d7 = arg; + kstp->retpc = (unsigned long)ret_from_kernel_thread; p->thread.usp = 0; + p->thread.ksp = (unsigned long)kstp; return 0; } memcpy(frame, container_of(current_pt_regs(), struct fork_frame, regs),