Re: [PATCH v4 0/3] m68k: Improved switch stack handling

Michael Schmitz <schmitzmic@xxxxxxxxx> · Fri, 23 Jul 2021 16:23:12 +1200

Hi Eric,

Am 23.07.2021 um 02:49 schrieb Eric W. Biederman:
Michael Schmitz <schmitzmic@xxxxxxxxx> writes:

Hi Eric,

On 21/07/21 8:32 am, Eric W. Biederman wrote:

diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S
index a8f4161..6c92d38 100644
--- a/arch/m68k/fpsp040/skeleton.S
+++ b/arch/m68k/fpsp040/skeleton.S
@@ -502,7 +502,17 @@ in_ea:
  	.section .fixup,#alloc,#execinstr
  	.even
  1:
+
+	SAVE_ALL_INT
+	SAVE_SWITCH_STACK
         ^^^^^^^^^^

I don't think this saves the registers in the well known fixed location
on the stack because some registers are saved at the exception entry
point.

The FPU exception entry points are not using the exception entry code in
head.S. These entry points are stored in the exception vector table directly. No
saving of a syscall stack frame happens there. The FPU places its exception
frame on the stack, and that is what the FPU exception handlers use.

(If these have to call out to the generic exception handlers again, they will
build a minimal stack frame, see code in skeleton.S.)

Calling fpsp040_die() is no different from calling a syscall that may need to
have access to the full stack frame. The 'fixed location' is just 'on the stack
before calling  fpsp040_die()', again this is no different from calling
e.g. sys_fork() which does not take a pointer to the begin of the stack frame as
an argument.

I must admit I never looked at how do_exit() figures out where the stack frame
containing the saved registers is stored, I just assumed it unwinds the stack up
to the point where the caller syscall was made, and works from there. The same
strategy ought to work here.

For do_exit the part we need to be careful with is PTRACE_EVENT_EXIT,
which means it is ptrace that we need to look at.

For m68k the code in put_reg and get_reg finds the registers by looking
at task->thread.esp0.

Thanks, that's what I was missing here.

I was expecting m68k to use the same technique as alpha which expects a
fixed offset from task_stack_page(task).

So your code will work if you add code to update task->thread.esp0 which
is also known as THREAD_ESP0 in entry.S

Shoving

movel   %sp,%curptr@(TASK_THREAD+THREAD_ESP0)

in between the SAVE_ALL_INT and SAVE_SWITCH_STACK ought to do the trick 
there.


Without being saved at the well known fixed location if some process
stops in PTRACE_EVENT_EXIT in do_exit we likely get some complete
gibberish.

That is probably safe.

  	jbra	fpsp040_die
+	addql   #8,%sp
+	addql   #8,%sp
+	addql   #8,%sp
+	addql   #8,%sp
+	addql   #8,%sp
+	addql   #4,%sp
+	rts
Especially as everything after jumping to fpsp040_die does not execute.

Unless we change fpsp040_die() to call force_sig(SIGSEGV).

Yes.  I think we would probably need to have it also call get_signal and
all of that, because I don't think the very light call path for that
exception includes testing if signals are pending.

As far as I can see, there is a test for pending signals:

ENTRY(ret_from_exception)
.Lret_from_exception:
        btst    #5,%sp@(PT_OFF_SR)      | check if returning to kernel
        bnes    1f                      | if so, skip resched, signals
        | only allow interrupts when we are really the last one on the
        | kernel stack, otherwise stack overflow can occur during
        | heavy interrupt load
        andw    #ALLOWINT,%sr

resume_userspace:
        movel   %curptr@(TASK_STACK),%a1
        moveb   %a1@(TINFO_FLAGS+3),%d0	| bits 0-7 of TINFO_FLAGS
        jne     exit_work		| any bit set? -> exit_work
1:      RESTORE_ALL

exit_work:
        | save top of frame
        movel   %sp,%curptr@(TASK_THREAD+THREAD_ESP0)
        lslb    #1,%d0			| shift out TIF_NEED_RESCHED
        jne     do_signal_return	| any remaining bit 
(signal/notify_resume)? -> do_signal_return
        pea     resume_userspace
        jra     schedule

As long as TIF_NOTIFY_SIGNAL or TIF_SIGPENDING are set, do_signal_return 
will be called.



The way the code is structured it is actively incorrect to return from
fpsp040_die, as the code does not know what to do if it reads a byte
from userspace and there is nothing there.

Correct - my hope is that upon return from the FPU exception (that 
continued after a dodgy read or write), we get the signal delivered and 
will die then.


So instead of handling -EFAULT like most pieces of kernel code the code
just immediately calls do_exit, and does not even attempt to handle
the error.

That is not my favorite strategy at all, but I suspect it isn't worth
it, or safe to update the skeleton.S to handle errors.  Especially as we
have not even figured out how to test that code yet.

That's bothering me more than a little, but I need to find out whether 
the emulator even handles FPU exceptions correctly ...

Cheers,

	Michael


Eric