On Tue, 18 Apr 2023, Michael Schmitz wrote:
Am 18.04.2023 um 14:04 schrieb Finn Thain:
On Tue, 18 Apr 2023, Michael Schmitz wrote:
Am 16.04.2023 um 18:44 schrieb Finn Thain:
0xeffff750: 0xc01a0000 saved $a5 == libc .got
0xeffff74c: 0xc0023e8c saved $a3 == &__stack_chk_guard
0xeffff748: 0x00000000 saved $a2
0xeffff744: 0x00000001 saved $d5
0xeffff740: 0xeffff86e saved $d4
0xeffff73c: 0xeffff86a saved $d3
0xeffff738: 0x00000002 saved $d2
0xeffff734: 0x00000000
0xeffff730: 0x00000000
0xeffff72c: 0x00000000
0xeffff728: 0x00000000
0xeffff724: 0x00000000
0xeffff720: 0x00000000
0xeffff71c: 0x00000000
0xeffff718: 0x00000000
0xeffff714: 0x00000000
0xeffff710: 0x00000000
0xeffff70c: 0x00000000
0xeffff708: 0x00000000
0xeffff704: 0x00000000
0xeffff700: 0x00000000
0xeffff6fc: 0x00000000
0xeffff6f8: 0x00000000
0xeffff6f4: 0x00000000
0xeffff6f0: 0x00000000
0xeffff6ec: 0x00000000
0xeffff6e8: 0x00000000
0xeffff6e4: 0x00000000
0xeffff6e0: 0x00000000
0xeffff6dc: 0x00000000
0xeffff6d8: 0x00000000
0xeffff6d4: 0x00000000
0xeffff6d0: 0x00000000
0xeffff6cc: 0x00000000
0xeffff6c8: 0x00000000
0xeffff6c4: 0x00000000
0xeffff6c0: 0x00000000
0xeffff6bc: 0x00000000
0xeffff6b8: 0x00000000
0xeffff6b4: 0x00000000
0xeffff6b0: 0x00000000
0xeffff6ac: 0x00000000
0xeffff6a8: 0x00000000
0xeffff6a4: 0x00000000
0xeffff6a0: 0x00000000
0xeffff69c: 0x00000000
0xeffff698: 0x00000000
0xeffff694: 0x00000000
0xeffff690: 0x00000000
0xeffff68c: 0x00000000
0xeffff688: 0x00000000
0xeffff684: 0x00000000
0xeffff680: 0x00000000
0xeffff67c: 0x00000000
0xeffff678: 0x00000000
0xeffff674: 0x00000000
0xeffff670: 0x00000000
0xeffff66c: 0x00000000
0xeffff668: 0x00000000
0xeffff664: 0x00000000
0xeffff660: 0x41000000
0xeffff65c: 0x00000000
0xeffff658: 0x00000000
0xeffff654: 0x00000000
0xeffff650: 0x00000000
0xeffff64c: 0x80000000
0xeffff648: 0x3fff0000
0xeffff644: 0x00000000
0xeffff640: 0xd0000000
0xeffff63c: 0x40020000 <= (sc.formatvec & 0xffff) << 16; fpregs from here on
0xeffff638: 0x81b60080 <= (sc.pc & 0xffff) << 16 | sc.formatvec >> 16
0xeffff634: 0x0000c00e <= sc.sr << 16 sc.pc >> 16
0xeffff630: 0xd001e4e3 <= sc.a1
0xeffff62c: 0xc0028780 <= sc.a0
0xeffff628: 0xffffffff <= sc.d1
0xeffff624: 0x0000041f <= sc.d0
0xeffff620: 0xeffff738 <= sc.usp
0xeffff61c: 0x00000000 <= sc.mask
0xeffff618: 0x00000000 <= extramask
0xeffff614: 0x00000000 <= frame.retcode[1]
0xeffff610: 0x70774e40 moveq #119,%d0 ; trap #0
0xeffff60c: 0xeffff61c <= frame->sc
0xeffff608: 0x00000080 <= tregs->vector
0xeffff604: 0x00000011 <= signal no.
0xeffff600: 0xeffff610 return address
The above comes from dash running under gdb under qemu, which does
not exhibit the failure but is convenient for that kind of
experiment.
I would have expected to see a different signal trampoline (for
sys_rt_sigreturn) ...
Well, this seems to be the trampoline from setup_frame() and not
setup_rt_frame().
According to the manpages I've seen, glibc ought to pick rt signals if
the kernel supports those (which I suppose it does).
It's got to be the trampoline from setup_frame() because dash did this:
act.sa_flags = 0;
sigfillset(&act.sa_mask);
sigaction(signo, &act, 0);
and the kernel did this:
/* set up the stack frame */
if (ksig->ka.sa.sa_flags & SA_SIGINFO)
err = setup_rt_frame(ksig, oldset, regs);
else
err = setup_frame(ksig, oldset, regs);
But anyway:
The saved pc is 0xc00e81b6 which does match the backtrace above.
Vector offset 80 matches trap 0 which suggests 0xc00e81b6 should be
the instruction after a trap 0 instruction. d0 is 1055 which is not a
signal number I recognize.
I don't know what d0 represents here. But &frame->sig == 0x11 is
correct (SIGCHLD).
Correct - that all works out. But d0 holds the syscall number when we
enter the kernel via trap 0, and that one is odd.
Well, you showed subsequently that the kernel was probably entered via a
page fault and not the get_thread_area trap. Would that explain the d0
value?
...
Here's some stack memory from the core dump.
0xeffff0dc: 0xd000c38e return address waitproc+124
0xeffff0d8: 0xd001c1ec frame 0 $fp == &suppressint
0xeffff0d4: 0x00add14b canary
0xeffff0d0: 0x00000000
0xeffff0cc: 0x0000000a
0xeffff0c8: 0x00000202
0xeffff0c4: 0x00000008
0xeffff0c0: 0x00000000
0xeffff0bc: 0x00000000
0xeffff0b8: 0x00000174
0xeffff0b4: 0x00000004
0xeffff0b0: 0x00000004
0xeffff0ac: 0x00000006
0xeffff0a8: 0x000000e0
0xeffff0a4: 0x000000e0
0xeffff0a0: 0x00171f20
0xeffff09c: 0x00171f20
0xeffff098: 0x00171f20
0xeffff094: 0x00000002
0xeffff090: 0x00002000
0xeffff08c: 0x00000006
0xeffff088: 0x0000e920
0xeffff084: 0x00005360
0xeffff080: 0x00170700
0xeffff07c: 0x00170700
0xeffff078: 0x00170700 frame 0 $fp - 96
0xeffff074: 0xd001b874 saved $a5 == dash .got
0xeffff070: 0xd001e498 saved $a3 == &dash_errno
0xeffff06c: 0xd001e718 frame 0 $sp saved $a2 == &gotsigchld
0xeffff068: 0x00000000
0xeffff064: 0x00000000
0xeffff060: 0xeffff11e
0xeffff05c: 0xffffffff
0xeffff058: 0xc00e4164 return address __wait3+244
0xeffff054: 0x00add14b canary
0xeffff050: 0x00000001
0xeffff04c: 0x00000004
0xeffff048: 0x0000000d
0xeffff044: 0x0000000d
0xeffff040: 0x0015ef82
0xeffff03c: 0x0015ef82
0xeffff038: 0x0015ef82
0xeffff034: 0x00000003
0xeffff030: 0x00000004
0xeffff02c: 0x00000004
0xeffff028: 0x00000140
0xeffff024: 0x00000140
0xeffff020: 0x00000034
0xeffff01c: 0x00000034
0xeffff018: 0x00000034
0xeffff014: 0x00000006
0xeffff010: 0x003b003a
0xeffff00c: 0x000a0028
0xeffff008: 0x00340020
0xeffff004: 0xc019c000 saved $a5 == libc .got
0xeffff000: 0xeffff068 saved $a3 (corrupted)
0xefffeffc: 0x00000000 saved $a2
0xefffeff8: 0x00000001 saved $d5
0xefffeff4: 0xeffff122 saved $d4
0xefffeff0: 0xeffff11e saved $d3
0xefffefec: 0x00000000 saved $d2
0xefffefe8: 0xc00e419a return address __GI___wait4_time64+38
0xefffefe4: 0xc0028780
0xefffefe0: 0x3c344bfb
0xefffefdc: 0x000af353
0xefffefd8: 0x3c340170
0xefffefd4: 0x00000000
0xefffefd0: 0xc00e417c
0xefffefcc: 0xc00e417e
0xefffefc8: 0xc00e4180
0xefffefc4: 0x48e73c34
0xefffefc0: 0x00000000
0xefffefbc: 0xefffeff8
0xefffefb8: 0xefffeffc
0xefffefb4: 0x4bfb0170
0xefffefb0: 0x0eee0709
0xefffefac: 0x00000000
0xefffefa8: 0x00000000
0xefffefa4: 0x00000000
0xefffefa0: 0x00000000
0xefffef9c: 0x00000000
0xefffef98: 0x00000000
0xefffef94: 0x00000000
0xefffef90: 0x00000000
0xefffef8c: 0x00000000
0xefffef88: 0x00000000
0xefffef84: 0x00000000
0xefffef80: 0x00000000
0xefffef7c: 0x00000000
0xefffef78: 0x00000000
0xefffef74: 0x00000000
0xefffef70: 0x00000000
0xefffef6c: 0x00000000
0xefffef68: 0x00000000
0xefffef64: 0x00000000
0xefffef60: 0x00000000
0xefffef5c: 0x00000000
0xefffef58: 0x00000000
0xefffef54: 0x00000000
0xefffef50: 0x00000000
0xefffef4c: 0x00000000
0xefffef48: 0x00000000
0xefffef44: 0x00000000
0xefffef40: 0x00000000
0xefffef3c: 0x00000000
0xefffef38: 0x00000000
0xefffef34: 0x00000000
0xefffef30: 0x00000000
0xefffef2c: 0x00000000
0xefffef28: 0x00000000
0xefffef24: 0x00000000
0xefffef20: 0x00000000
0xefffef1c: 0x00000000
0xefffef18: 0x00000000
0xefffef14: 0x00000000
0xefffef10: 0x7c0effff
0xefffef0c: 0xffffffff
0xefffef08: 0xaaaaaaaa
0xefffef04: 0xaf54eaaa
0xefffef00: 0x40040000
0xefffeefc: 0x40040000
0xefffeef8: 0x2b000000
0xefffeef4: 0x00000000
0xefffeef0: 0x00000000
0xefffeeec: 0x408ece9a
0xefffeee8: 0x00000000
0xefffeee4: 0xf0ff0000
0xefffeee0: 0x0f800000
0xefffeedc: 0xf0fff0ff
0xefffeed8: 0x1f380000
0xefffeed4: 0x00000000
0xefffeed0: 0x00000000
0xefffeecc: 0x00000000
0xefffeec8: 0xffffffff
0xefffeec4: 0xffffffff
0xefffeec0: 0x7fff0000
0xefffeebc: 0xffffffff
0xefffeeb8: 0xffffffff
0xefffeeb4: 0x7fff0000 sc_formatvec
The signal frame is not readily apparent (to me).
From looking at the above stack dump, sc ought to start at 0xefffee90,
and the trampoline would be three words below that.
0xefffeeb0: 0x4178b008 sc_pc, sc_formatvec
0xefffeeac: 0x0008c00e sc_sr, sc_pc
0xefffeea8: 0xd00223bb sc_a1
0xefffeea4: 0xd001e32c sc_a0
0xefffeea0: 0x00000003 sc_d1
0xefffee9c: 0xeffff11e sc_d0
0xefffee98: 0xeffff004 sc_usp
0xefffee94: 0x00000000 sc_mask
0xefffee90: 0x00000000 extramask
0xefffee8c: 0xc0024a90 retcode[1]
0xefffee88: 0x70774e40 retcode[0]
0xefffee84: 0xefffee94 psc
0xefffee80: 0x00000008 code
0xefffee7c: 0x00000011 sig
0xefffee78: 0xefffee88 pretcode
OK, that's our SIGCHLD. But the signal frame format is odd ...
Frame format b, vector offset 008. That's a bus error?
How does that get on the user mode stack?
0xefffee74: 0xc019c000
0xefffee70: 0x00000000
0xefffee6c: 0xc0025878
0xefffee68: 0xc0007ed4
0xefffee64: 0xc0024000
0xefffee60: 0xefffef50
0xefffee5c: 0xc0024000
0xefffee58: 0xc002a034
0xefffee54: 0xc0024a90
0xefffee50: 0xc0025878
0xefffee4c: 0x00000001
0xefffee48: 0x0017f020
0xefffee44: 0x0000002c
0xefffee40: 0x0000000f
0xefffee3c: 0x00000000
0xefffee38: 0xfffff7fa
0xefffee34: 0xffffffff
0xefffee30: 0x00009782
0xefffee2c: 0x00000000
0xefffee28: 0x0000001e
0xefffee24: 0xc0025858
0xefffee20: 0xc0025af8
0xefffee1c: 0xc000b376
0xefffee18: 0xc0024000
0xefffee14: 0xc0025878
0xefffee10: 0x0000001d
0xefffee0c: 0xd0001b60
0xefffee08: 0x0000002f
0xefffee04: 0xc002563e
0xefffee00: 0xc0025490
The last address you show corresponds to 0xeffff640 in first dump
above, which is at the start of the saved fpregs. I'd say we just
miss the beginning of the signal frame?
It looks like you're right. I'm not sure how I missed that.
So when the signal was delivered, PC == 0xc00e4178 and USP ==
0xc00e4178.
USP is 0xeffff004 AFAICS. That's the location 15 was saved to above
(holding libc .got according to your interpretation).
Right, it was a typo. USP is 0xeffff004, where a5 is to be saved.
The saved PC is that from the exception frame, in this case a long bus
error sequence fault frame. The PC is that of the instruction executing
when the fault occurred. As you say, that's the moveml saving registers
to the stack.
I don't believe the whole fault frame is on the signal stack in one
contiguous piece, just the first four words, then we have struct
sigcontext. But after that, the extra contents follows, and that nicely
explains the extra bits right below the return address from the
__m68k_read_tp call.
Those addresses can be found in the disassembly and the stack contents
I sent previously (quoted above) and it all seems to line up.
(My reasoning is that copy_siginfo_to_user clears the end of the
signal stack, which is what we can see in both cases.)
Can't explain the 14 words below the saved return address though.
Right. Is it sc_fpstate? Perhaps we should expect QEMU to differ here.
See above - I think what's stored there is the extra frame content for a
format b bus error frame. But that extra frame is incomplete at best
(should be 22 longwords, only a4 are seen). Probably overwritten by the
stack frame from __GI___wait4_time64.
Maybe the exception frame leaked onto the user stack via setup_frame()?
Let's parse what's left:
<=
0xefffefe4: 0xc0028780 <= internal registers (6x)
0xefffefe0: 0x3c344bfb <=
0xefffefdc: 0x000af353 <=
0xefffefd8: 0x3c340170 <= internal reg; version no.
0xefffefd4: 0x00000000 <= data input buffer
0xefffefd0: 0xc00e417c <= internal registers (2x)
0xefffefcc: 0xc00e417e <= stage b address
0xefffefc8: 0xc00e4180 <= internal registers (4x)
0xefffefc4: 0x48e73c34 <=
0xefffefc0: 0x00000000 <= data output buffer
0xefffefbc: 0xefffeff8 <= internal registers (2x)
0xefffefb8: 0xefffeffc <= data fault address
0xefffefb4: 0x4bfb0170 <= ins stage c, stage b
0xefffefb0: 0x0eee0709 <= internal register; ssw
The fault address is the location on the stack where a2 is saved. That
does match the data output buffer contents BTW. fc, fb, rc, rb bits
clear means the fault didn't occur in stage b or c instructions. ssw bit
8 set indicates a data fault - the data cycle should be rerun on rte. rm
and rw bits clear tell us it's a write fault. If the moveml instruction
copies registers to the stack in descending order, the fault address
makes sense - the stack pointer just crossed a page boundary.
Well spotted!
Bottom line is, the corrupted %a3 register would have been saved by
the MOVEM instruction at 0xc00e4178, which turns out to be the PC in
the signal frame. So it certainly looks like the kernel was the
culprit here.
I think the moveml instruction did cause a bus error, and on return from
that exception the signal got delivered.
Maybe the signal frame was partially overwritten by the resumed MOVEM?
I wonder what we'd see if we patched the kernel to log every user data
write fault caused by a MOVEM instruction. I'll try to code that up.
On entering the buserror handler, only a1 and a2 are saved, but the
comment in entry.h states that a3-a6 and d6, d7 are preserved by C code.
After buserr_c returns, a3 should be restored to what it was when taking
the bus error. All registers restored before rte, the moveml instruction
ought to be able to resume normally.
Unless that register use constraint has changed, I don't see how a3
could have changed midway during return from the bus error exception.
But maybe a disassembly of buserr_c from your kernel could confirm that?
I disassembled the relevant build. AFAICT, buserr_c() saves and restores
those registers in the right places.
BTW, I've reproduced the failures with kernels built with both GCC 12 and
GCC 6.