Michael Schmitz <schmitzmic@xxxxxxxxx> writes: > Hi Eric, > > Am 23.07.2021 um 02:49 schrieb Eric W. Biederman: >> Michael Schmitz <schmitzmic@xxxxxxxxx> writes: >> >>> Hi Eric, >>> >>> On 21/07/21 8:32 am, Eric W. Biederman wrote: >>>> >>>>> diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S >>>>> index a8f4161..6c92d38 100644 >>>>> --- a/arch/m68k/fpsp040/skeleton.S >>>>> +++ b/arch/m68k/fpsp040/skeleton.S >>>>> @@ -502,7 +502,17 @@ in_ea: >>>>> .section .fixup,#alloc,#execinstr >>>>> .even >>>>> 1: >>>>> + >>>>> + SAVE_ALL_INT >>>>> + SAVE_SWITCH_STACK >>>> ^^^^^^^^^^ >>>> >>>> I don't think this saves the registers in the well known fixed location >>>> on the stack because some registers are saved at the exception entry >>>> point. >>> >>> The FPU exception entry points are not using the exception entry code in >>> head.S. These entry points are stored in the exception vector table directly. No >>> saving of a syscall stack frame happens there. The FPU places its exception >>> frame on the stack, and that is what the FPU exception handlers use. >>> >>> (If these have to call out to the generic exception handlers again, they will >>> build a minimal stack frame, see code in skeleton.S.) >>> >>> Calling fpsp040_die() is no different from calling a syscall that may need to >>> have access to the full stack frame. The 'fixed location' is just 'on the stack >>> before calling fpsp040_die()', again this is no different from calling >>> e.g. sys_fork() which does not take a pointer to the begin of the stack frame as >>> an argument. >>> >>> I must admit I never looked at how do_exit() figures out where the stack frame >>> containing the saved registers is stored, I just assumed it unwinds the stack up >>> to the point where the caller syscall was made, and works from there. The same >>> strategy ought to work here. >> >> For do_exit the part we need to be careful with is PTRACE_EVENT_EXIT, >> which means it is ptrace that we need to look at. >> >> For m68k the code in put_reg and get_reg finds the registers by looking >> at task->thread.esp0. > > Thanks, that's what I was missing here. >> >> I was expecting m68k to use the same technique as alpha which expects a >> fixed offset from task_stack_page(task). >> >> So your code will work if you add code to update task->thread.esp0 which >> is also known as THREAD_ESP0 in entry.S > > Shoving > > movel %sp,%curptr@(TASK_THREAD+THREAD_ESP0) > > in between the SAVE_ALL_INT and SAVE_SWITCH_STACK ought to do the > trick there. > >> >>>> Without being saved at the well known fixed location if some process >>>> stops in PTRACE_EVENT_EXIT in do_exit we likely get some complete >>>> gibberish. >>>> >>>> That is probably safe. >>>> >>>>> jbra fpsp040_die >>>>> + addql #8,%sp >>>>> + addql #8,%sp >>>>> + addql #8,%sp >>>>> + addql #8,%sp >>>>> + addql #8,%sp >>>>> + addql #4,%sp >>>>> + rts >>>> Especially as everything after jumping to fpsp040_die does not execute. >>> >>> Unless we change fpsp040_die() to call force_sig(SIGSEGV). >> >> Yes. I think we would probably need to have it also call get_signal and >> all of that, because I don't think the very light call path for that >> exception includes testing if signals are pending. > > As far as I can see, there is a test for pending signals: > > ENTRY(ret_from_exception) > .Lret_from_exception: > btst #5,%sp@(PT_OFF_SR) | check if returning to kernel > bnes 1f | if so, skip resched, signals > | only allow interrupts when we are really the last one on the > | kernel stack, otherwise stack overflow can occur during > | heavy interrupt load > andw #ALLOWINT,%sr > > resume_userspace: > movel %curptr@(TASK_STACK),%a1 > moveb %a1@(TINFO_FLAGS+3),%d0 | bits 0-7 of TINFO_FLAGS > jne exit_work | any bit set? -> exit_work > 1: RESTORE_ALL > > exit_work: > | save top of frame > movel %sp,%curptr@(TASK_THREAD+THREAD_ESP0) > lslb #1,%d0 | shift out TIF_NEED_RESCHED > jne do_signal_return | any remaining bit > (signal/notify_resume)? -> do_signal_return > pea resume_userspace > jra schedule > > As long as TIF_NOTIFY_SIGNAL or TIF_SIGPENDING are set, > do_signal_return will be called. I was going to say I don't think so, as my tracing of the code lead in a couple of different directions. Upon closer inspection all those paths either lead to fpsp_done or more directly to ret_from_exception. For anyone else who might want to trace the code, or for myself later on when I forget. As best as I can figure the hardware exception vector table is setup in: arch/m68k/kernel/vector.c For the vectors in question it appears to be this chunk of code: if (CPU_IS_040 && !FPU_IS_EMU) { /* set up FPSP entry points */ asmlinkage void dz_vec(void) asm ("dz"); asmlinkage void inex_vec(void) asm ("inex"); asmlinkage void ovfl_vec(void) asm ("ovfl"); asmlinkage void unfl_vec(void) asm ("unfl"); asmlinkage void snan_vec(void) asm ("snan"); asmlinkage void operr_vec(void) asm ("operr"); asmlinkage void bsun_vec(void) asm ("bsun"); asmlinkage void fline_vec(void) asm ("fline"); asmlinkage void unsupp_vec(void) asm ("unsupp"); vectors[VEC_FPDIVZ] = dz_vec; vectors[VEC_FPIR] = inex_vec; vectors[VEC_FPOVER] = ovfl_vec; vectors[VEC_FPUNDER] = unfl_vec; vectors[VEC_FPNAN] = snan_vec; vectors[VEC_FPOE] = operr_vec; vectors[VEC_FPBRUC] = bsun_vec; vectors[VEC_LINE11] = fline_vec; vectors[VEC_FPUNSUP] = unsupp_vec; } Which leads me to call traces that look like this: hw fline fpsp_fline mem_read user_read copyin in_ea <page-fault> fpsp040_die If that mem_read returns it can be followed by not_mvcr real_fline ret_from_exception Or it can be followed by fix_con uni_2 gen_except do_clean finish_up fpsp_done ret_from_exception >> The way the code is structured it is actively incorrect to return from >> fpsp040_die, as the code does not know what to do if it reads a byte >> from userspace and there is nothing there. > > Correct - my hope is that upon return from the FPU exception (that > continued after a dodgy read or write), we get the signal delivered > and will die then. Yes. That does look like a good strategy. I am wondering if there are values we can return that will make the path out of the exit routine more deterministic. I have played with that a little bit today, but it doesn't look like I am going to have time to put together any kind of real patch today. Simply modifying fpsp040_die to call force_sigsegv(SIGSEGV) should be enough to trigger a signal (no call stack work needed if we remove do_exit). The tricky bit is what value do we want to fake when we can not read anything from userspace. For a write fault we should just be able to skip the write entirely. In both cases we probably should break out of the loop prematurely. But I don't know if that is necessary. The lazy strategy would be to copy the ifpsp060 code and simply oops the kernel if the read or write of userspace gets a page fault. >> So instead of handling -EFAULT like most pieces of kernel code the code >> just immediately calls do_exit, and does not even attempt to handle >> the error. >> >> That is not my favorite strategy at all, but I suspect it isn't worth >> it, or safe to update the skeleton.S to handle errors. Especially as we >> have not even figured out how to test that code yet. > > That's bothering me more than a little, but I need to find out whether > the emulator even handles FPU exceptions correctly ... As a fallback plan we can following the lead of ifpsp060/os.S and simply not catch the kernel triggered page fault, and let arch/m68k/mm/fault.c:send_fault_sig() return a kernel oops. It is not ideal as it allows userspace to trigger a kernel oops, but it does at least keep the kernel in a consistent state. diff --git a/arch/m68k/fpsp040/skeleton.S b/arch/m68k/fpsp040/skeleton.S index a8f41615d94a..4c6c4b07ef38 100644 --- a/arch/m68k/fpsp040/skeleton.S +++ b/arch/m68k/fpsp040/skeleton.S @@ -479,7 +479,6 @@ copyout: | movec %d1,%DFC | set dfc for user data space moreout: moveb (%a0)+,%d1 | fetch supervisor byte -out_ea: movesb %d1,(%a1)+ | write user byte dbf %d0,moreout rts @@ -493,21 +492,9 @@ copyin: | SFC is already set | movec %d1,%SFC | set sfc for user space morein: -in_ea: movesb (%a0)+,%d1 | fetch user byte moveb %d1,(%a1)+ | write supervisor byte dbf %d0,morein rts - .section .fixup,#alloc,#execinstr - .even -1: - jbra fpsp040_die - - .section __ex_table,#alloc - .align 4 - - .long in_ea,1b - .long out_ea,1b - |end diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c index e2a6f3556211..3ec6ae1bdaf9 100644 --- a/arch/m68k/kernel/traps.c +++ b/arch/m68k/kernel/traps.c @@ -1144,15 +1144,6 @@ asmlinkage void set_esp0(unsigned long ssp) current->thread.esp0 = ssp; } -/* - * This function is called if an error occur while accessing - * user-space from the fpsp040 code. - */ -asmlinkage void fpsp040_die(void) -{ - do_exit(SIGSEGV); -} - #ifdef CONFIG_M68KFPU_EMU asmlinkage void fpemu_signal(int signal, int code, void *addr) { Eric