Re: core dump analysis, was Re: stack smashing detected

Michael Schmitz <schmitzmic@xxxxxxxxx> · Wed, 19 Apr 2023 20:15:07 +1200

Hi Finn,

Am 19.04.2023 um 13:50 schrieb Finn Thain:
I would have expected to see a different signal trampoline (for
sys_rt_sigreturn) ...

Well, this seems to be the trampoline from setup_frame() and not
setup_rt_frame().

According to the manpages I've seen, glibc ought to pick rt signals if
the kernel supports those (which I suppose it does).

It's got to be the trampoline from setup_frame() because dash did this:

        act.sa_flags = 0;
        sigfillset(&act.sa_mask);
        sigaction(signo, &act, 0);

Ah - dash explicitly requests the old format. Make sense then.

and the kernel did this:

        /* set up the stack frame */
        if (ksig->ka.sa.sa_flags & SA_SIGINFO)
                err = setup_rt_frame(ksig, oldset, regs);
        else
                err = setup_frame(ksig, oldset, regs);

But anyway:

The saved pc is 0xc00e81b6 which does match the backtrace above.
Vector offset 80 matches trap 0 which suggests 0xc00e81b6 should be
the instruction after a trap 0 instruction. d0 is 1055 which is not a
signal number I recognize.

I don't know what d0 represents here. But &frame->sig == 0x11 is
correct (SIGCHLD).

Correct - that all works out. But d0 holds the syscall number when we
enter the kernel via trap 0, and that one is odd.

Well, you showed subsequently that the kernel was probably entered via a
page fault and not the get_thread_area trap. Would that explain the d0
value?

That d0 was from the dash under gdb run. But I got my signal delivery 
mixed up - d0 is only expected to hold the syscall number when we issue 
a syscall. That would be in the child process, not the parent which we 
debug.

d0 is just whatever the parent had in its register when it started 
do_signal_return after exception or syscall. On return after syscall, d0 
holds the task info flags, maybe that's what we see here.

See above - I think what's stored there is the extra frame content for a
format b bus error frame. But that extra frame is incomplete at best
(should be 22 longwords, only a4 are seen). Probably overwritten by the
stack frame from __GI___wait4_time64.

Maybe the exception frame leaked onto the user stack via setup_frame()?

Yes, for exception frames larger than four words the excess is copied 
after the end of the sigcontext block.

Let's parse what's left:
<=
0xefffefe4:     0xc0028780        <= internal registers (6x)
0xefffefe0:     0x3c344bfb        <=
0xefffefdc:     0x000af353        <=
0xefffefd8:     0x3c340170        <= internal reg; version no.
0xefffefd4:     0x00000000        <= data input buffer
0xefffefd0:     0xc00e417c        <= internal registers (2x)
0xefffefcc:     0xc00e417e        <= stage b address
0xefffefc8:     0xc00e4180        <= internal registers (4x)
0xefffefc4:     0x48e73c34        <=
0xefffefc0:     0x00000000        <= data output buffer
0xefffefbc:     0xefffeff8        <= internal registers (2x)
0xefffefb8:     0xefffeffc        <= data fault address
0xefffefb4:     0x4bfb0170        <= ins stage c, stage b
0xefffefb0:     0x0eee0709        <= internal register; ssw

The fault address is the location on the stack where a2 is saved. That
does match the data output buffer contents BTW. fc, fb, rc, rb bits
clear means the fault didn't occur in stage b or c instructions. ssw bit
8 set indicates a data fault - the data cycle should be rerun on rte. rm
and rw bits clear tell us it's a write fault. If the moveml instruction
copies registers to the stack in descending order, the fault address
makes sense - the stack pointer just crossed a page boundary.

Well spotted!

Bottom line is, the corrupted %a3 register would have been saved by
the MOVEM instruction at 0xc00e4178, which turns out to be the PC in
the signal frame. So it certainly looks like the kernel was the
culprit here.

I think the moveml instruction did cause a bus error, and on return from
that exception the signal got delivered.

Maybe the signal frame was partially overwritten by the resumed MOVEM?

That's possible - the saved usp in the signal frame is that of the first 
register saved to the stack (before the page fault).

I wonder what we'd see if we patched the kernel to log every user data
write fault caused by a MOVEM instruction. I'll try to code that up.

If these instructions did always cause stack corruption on 030, I think 
we would have noticed long ago?

On entering the buserror handler, only a1 and a2 are saved, but the
comment in entry.h states that a3-a6 and d6, d7 are preserved by C code.
After buserr_c returns, a3 should be restored to what it was when taking
the bus error. All registers restored before rte, the moveml instruction
ought to be able to resume normally.

Unless that register use constraint has changed, I don't see how a3
could have changed midway during return from the bus error exception.
But maybe a disassembly of buserr_c from your kernel could confirm that?

I disassembled the relevant build. AFAICT, buserr_c() saves and restores
those registers in the right places.

BTW, I've reproduced the failures with kernels built with both GCC 12 and
GCC 6.

Thanks - that was highly unlikely but had to be checked.

Leaves the possibility that some kernel bug did corrupt the saved a3 
copy in struct switch_stack... but that is not used in bus error 
exceptions. And the only other use of a3 is in ret_from_kernel_thread 
which is called only from copy_thread() ...

Still baffled...

Cheers,

	Michael