Re: stack smashing detected

Michael Schmitz <schmitzmic@xxxxxxxxx> · Thu, 9 Feb 2023 16:41:28 +1300

Hi Stan,

Am 08.02.2023 um 11:58 schrieb Michael Schmitz:
Thanks Stan,

On 8/02/23 08:37, Stan Johnson wrote:
Hi Michael,

On 2/5/23 3:19 PM, Michael Schmitz wrote:
...

Seeing Finn's report that Al Viro's VM_FAULT_RETRY fix may have solved
his task corruption troubles on 040, I just noticed that I probably
misunderstood how Al's patch works.

Botching up a fault retry and carrying on may well leave the page tables
in a state where some later access could go to the wrong page and
manifest as user space corruption. Could you try Al's patch 4 (m68k: fix
livelock in uaccess) to see if this helps?
...
ok, this appears to be the patch:

Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
---
  arch/m68k/mm/fault.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 4d2837eb3e2a..228128e45c67 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned
long address,
      fault = handle_mm_fault(vma, address, flags, regs);
      pr_debug("handle_mm_fault returns %x\n", fault);

-    if (fault_signal_pending(fault, regs))
+    if (fault_signal_pending(fault, regs)) {
+        if (!user_mode(regs))
+            goto no_context;
          return 0;
+    }

      /* The fault is fully completed (including releasing mmap lock) */
      if (fault & VM_FAULT_COMPLETED)

That's correct.

Your results show improvement but the problem does not entirely go away.

Looking at differences between 030 and 040/040 fault handling, it
appears only 030 handles faults corrected by exception tables (such as
used in uaccess macros) special, i.e. aborting bus error processing
while 040 and 060 carry on in the fault handler.

I wonder if that's the main difference between 030 and 040 behaviour?

Following the 040 code a bit further, I suspect that happens in the 040 
writeback handler, so this may be a red herring.

I'll try and log such accesses caught by exception tables on 030 to see
if they are rare enough to allow adding a kernel log message...

Looks like this kind of event is rare enough to not trigger in a normal 
boot on my 030. Please give the attached patch a try so we can confirm 
(or rule out) that user space access faults from kernel mode are to 
blame for your stack smashes.

Cheers,

	Michael


Cheers,

    Michael


From a55467a02b66addca6f74fc32b473bc077cb34b2 Mon Sep 17 00:00:00 2001
From: Michael Schmitz <schmitzmic@xxxxxxxxx>
Date: Thu, 9 Feb 2023 14:39:35 +1300
Subject: [PATCH] m68k: debug exception handling data faults on 030

030 faults handled by exception tables are just silently ignored - see how
many of these do happen in practice, and if they are related to 'stack
smashing' faults.

Signed-off-by: Michael Schmitz <schmitzmic@xxxxxxxxx>
---
 arch/m68k/kernel/traps.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index 5c8cba0efc63..b3cef760f7e8 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -554,8 +554,13 @@ static inline void bus_error030 (struct frame *fp)
 			}
 			/* Don't try to do anything further if an exception was
 			   handled. */
-			if (do_page_fault (&fp->ptregs, addr, errorcode) < 0)
+			if (do_page_fault (&fp->ptregs, addr, errorcode) < 0) {
+				pr_err("Exception handled for data %s fault at %#010lx in %s (pc=%#lx)\n",
+				       ssw & RW ? "read" : "write",
+				       fp->un.fmtb.daddr,
+				       space_names[ssw & DFC], fp->ptregs.pc);
 				return;
+			}
 		} else if (!(mmusr & MMU_I)) {
 			/* probably a 020 cas fault */
 			if (!(ssw & RM) && send_fault_sig(&fp->ptregs) > 0)
-- 
2.17.1