Re: signal delivery, was Re: reliable reproducer

Eero Tamminen <oak@xxxxxxxxxxxxxx> · Tue, 25 Apr 2023 12:26:13 +0300

Hi,

On 25.4.2023 4.55, Finn Thain wrote:
On Tue, 25 Apr 2023, Finn Thain wrote:
...
I wonder if we are seeing some fallout from the issue described in
do_page_fault() i.e. usp is unreliable.

                 /* Accessing the stack below usp is always a bug.  The
                    "+ 256" is there due to some instructions doing
                    pre-decrement on the stack and that doesn't show up
                    until later.  */
                 if (address + 256 < rdusp())
                         goto map_err;

Maybe we should try modifying get_sigframe() to increase the gap between
the signal and exception frames from 0-1 long words up to 64-65 long
words.

It turns out that doing so (patch below) does make the problem go away.
Was the exception frame getting clobbered?

diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
index b9f6908a31bc..94104699f5a8 100644
--- a/arch/m68k/kernel/signal.c
+++ b/arch/m68k/kernel/signal.c
@@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t frame_size)
  {
  	unsigned long usp = sigsp(rdusp(), ksig);
  
-	return (void __user *)((usp - frame_size) & -8UL);
+	return (void __user *)((usp - 256 - frame_size) & -8UL);
  }
  
  static int setup_frame(struct ksignal *ksig, sigset_t *set,

While this is most likely Hatari emulation [1] issue, it has some of the 
same triggering conditions, so I thought to mention it...

Above patch does not fix kernel panic I'm seen on booting Linux under 
Hatari emulated Atari Falcon, to a small IDE root fs with just (old 
Debian) Busybox and a shell script acting as init:
----------------------------------------
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
CPU: 0 PID: 1 Comm: sh Not tainted 6.2.0hatari-00006-g01793428cbc5-dirty #4
Stack from 00819dc8:
        00819dc8 0032b646 0032b646 00027800 00000001 00819de8 00292878 
0032b646
        00819e0c 0028d60e 00027848 0000000b 0081caff 000300d6 00815a90 
00000000
        00819f2c 00819e50 00028878 003243fd 0000000b 0000000b 00818007 
0081ca30
        000300d6 0081caf8 00000001 00819f18 0081bbd0 00819f2c 00000000 
00000000
        00000000 00000000 00819e60 00028ea0 0000000b 0000000b 00819e98 
000303c8
        0000000b 00818000 00000002 00000000 00000000 00000000 8017705e 
00819f78
Call Trace: [<00027800>] set_cpu_online+0x1c/0x3e
 [<00292878>] dump_stack+0x10/0x16
 [<0028d60e>] panic+0xc4/0x22a
 [<00027848>] arch_local_irq_enable+0x0/0x22
 [<000300d6>] do_signal_stop+0x0/0x152
 [<00028878>] do_exit+0x138/0x642
 [<000300d6>] do_signal_stop+0x0/0x152
 [<00028ea0>] do_group_exit+0x22/0x62
 [<000303c8>] get_signal+0xf8/0x4ba
 [<00003508>] test_ti_thread_flag+0x0/0x1a
 [<00003f4a>] do_notify_resume+0x36/0x488
 [<00005706>] send_fault_sig+0x28/0x8c
 [<00005888>] do_page_fault+0x11e/0x242
 [<00005814>] do_page_fault+0xaa/0x242
 [<00002814>] do_signal_return+0x10/0x1a
 [<00020007>] _I_CALL_TOP+0xd83/0x1900
 [<0000b280>] sp_over+0x2c/0x3c
 [<00007201>] atari_irq_enable+0x3/0x2a
 [<000066f6>] atari_get_hardware_list+0x33a/0x3e8
----------------------------------------

(Only way to get rid of the panic is disabling both CPU cache and 
prefetch emulation.)


Is it possible that in your case there's also IRQ (exception) happening 
at the same time with page fault and signal?


	- Eero

[1] 030 MMU vs. cache/prefetch vs. exception handling vs. IDE emulation.