Hi Finn,
Am 29.04.2023 um 12:28 schrieb Finn Thain:
[Cc kernel mailing list]
On Sat, 29 Apr 2023, Michael Schmitz wrote:
The address right past the end of the signal frame is equal to the fault
address. Any smaller gap size (as evidenced by the fact that 16 causes
corruption) will see the signal frame go past the fault address.
Using the user stack pointer to calculate the address for the signal
frame is incorrect - the user stack pointer still reflects the situation
before the faulting instruction.
Well, to quote do_page_fault(), USP is not reliable "due to some
instructions doing pre-decrement on the stack and that doesn't show up
until later".
I think what we need to do in setup_(rt_)_signal is to check if the
frame format is format b, and use the fault pc address from the
exception frame (if it's lower) instead of the usp. I'll test a patch
for that shortly.
Right. If we fix this in the signal handling code, we take care of address
errors as well, which was my concern with Andreas' patch. We can do what
do_page_fault() does and assume the worst (256 bytes?).
Well, we could do that if we could be certain this does not cause a
memory leak in some way. The reason I bring this up is that I've just
seen the kernel that I'd used to run the latest test cases (which
inserts a 20 byte gap only!) run amok terminating pretty much my entire
user space because it ran out of memory. Never seen the like of that.
I believe we can use USP to get a worst case estimate for the future
extent of the user stack. Using the fault address seems like an
unnecessary complication.
Yes, and it's not guaranteed to be a good estimate even in the general
case. Our intended use case is where the instruction was a movem, and
the fault address and USP are on adjacent pages. While that happens a
lot during function calls, it's by no means the only place it will
happen. We need to make sure the start of the frame ends up on the same
page as the fault address, if at all possible.
What is the most data a moveml <...>,sp@- can take? If that's not too
much, a constant offset for the signal stack in case of format b frames
on 020/030 might be easiest.
I agree that this should be contingent on frame format 0xB and also on
CPU_IS_020_OR_030 so the code is omitted on other builds. (Much like
Andreas' patch.)
Here's what I learned from running gdb on the LC III. The transcript below
shows a printk from handle_signal() following a printk from
bus_error030(). So this is a page fault at the top of stack page
efffe000..efffefff which also delivered a signal.
(gdb) file /root/stack-test
Reading symbols from /root/stack-test...
(No debugging symbols found in /root/stack-test)
(gdb) run 2050
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/stack-test 2050
[ 2958.890000] bus_error030: page fault: pc 80038d14, usp efffee28, addr efffee28
starting recursion
[ 2958.940000] bus_error030: page fault: pc 8000038c, usp efffe00c, addr efffdffc
[ 2958.960000] bus_error030: page fault: pc 8000038c, usp efffd004, addr efffcffc
[ 2958.970000] bus_error030: page fault: pc 8000038c, usp efffbffc, addr efffbffc
[ 2958.980000] bus_error030: page fault: pc 8000038c, usp efffa010, addr efff9ffc
[ 2958.990000] bus_error030: page fault: pc 8000038c, usp efff9008, addr efff8ffc
[ 2959.010000] bus_error030: page fault: pc 8000038c, usp efff8000, addr efff7ffc
[ 2959.030000] bus_error030: page fault: pc 8000038c, usp efff6014, addr efff5ffc
[ 2959.050000] bus_error030: page fault: pc 8000038c, usp efff500c, addr efff4ffc
[ 2959.060000] bus_error030: page fault: pc 8000038c, usp efff4004, addr efff3ffc
[ 2959.080000] bus_error030: page fault: pc 8000038c, usp efff2ffc, addr efff2ffc
[ 2959.100000] bus_error030: page fault: pc 8000038c, usp efff1010, addr efff0ffc
[ 2959.110000] bus_error030: page fault: pc 8000038c, usp efff0008, addr effefffc
[ 2959.130000] bus_error030: page fault: pc 8000038c, usp effef000, addr effeeffc
done.
starting recursion with fork
[Detaching after fork from child process 166]
[ 2959.310000] bus_error030: page fault: pc 8000038c, usp effff014, addr efffeffc
[ 2959.350000] handle_signal: fmtvec b008, pc 8000038c, usp efffee88
With a breakpoint set in the userspace signal handler, we can examine
stack memory from the USP as it presently is, going back up to USP as it
was at the time of the page fault (both addresses are given in printk
output above).
The difference is 396 bytes, and pahole confirms that sizeof(struct
sigframe) + sizeof(struct frame.un.fmtb) == 312 + 84 == 396.
(Note that this kernel was patched so that it doesn't truncate the result
of that pointer arithmetic. Hence there's no gap between the bottom of the
user stack and the start of the signal frame, which tends to confuse the
issue, as the gap would normally come and go unpredictably.)
Breakpoint 1, 0x800004a4 in handler ()
(gdb) x/110z 0xeffff014 - 396
0xefffee88: 0xefffee98 0x00000011 0x00000008 0xefffeea4
0xefffee98: 0x70774e40 0xf1f2f3f4 0x00000000 0x00000000
0xefffeea8: 0xeffff014 0x000007a7 0x000007a6 0x800744ec
0xefffeeb8: 0x8006d2b0 0x00008000 0x038cb008 0x7fff0000
0xefffeec8: 0xffffffff 0xffffffff 0x7fff0000 0xffffffff
0xefffeed8: 0xffffffff 0x00000000 0x00000000 0x00000000
0xefffeee8: 0x1f386980 0xf0fff0ff 0x0f800000 0xf0ff0000
0xefffeef8: 0x00000000 0x408ece9a 0x00000000 0x00000000
0xefffef08: 0x00000000 0x00000000 0x3ffe0000 0x00000000
0xefffef18: 0x00000000 0xffffffff 0x7c0effff 0x014ee530
0xefffef28: 0x08010000 0x00000004 0x00994000 0x00995e94
0xefffef38: 0x014ed540 0x00994000 0x0040ee24 0x08010000
0xefffef48: 0x00000004 0x004fa794 0x00995f18 0x00995e34
0xefffef58: 0x00079a72 0x004fa794 0x000270ce 0x00995e54
0xefffef68: 0x00027370 0x00000001 0x014f1e60 0x00995f2c
0xefffef78: 0x00994000 0x00000011 0x00995f18 0x00995e98
0xefffef88: 0x00027d08 0x00000011 0x00000004 0x00000000
0xefffef98: 0x00995f2c 0xd1d2d3d4 0xe1e2e3e4 0xf1f2f3f4
0xefffefa8: 0xeffffed0 0x00000001 0x00000001 0x00995f78
0xefffefb8: 0xb1b2b3b4 0xc1c2c3c4 0x0eee0709 0x24798004
0xefffefc8: 0xefffeffc 0xefffeff8 0xd1d2d3d4 0x48e7383c
0xefffefd8: 0x80000394 0x80000392 0x80000390 0x000007a7
0xefffefe8: 0x383cff0d 0x000ff153 0x383c2479 0x8007b4e0
0xefffeff8: 0x0000383c 0x00000000 0x80100000 0x800003f8
0xeffff008: 0x000000e0 0x014ed500 0xeffffcc4 0xc1c2c3c4
0xeffff018: 0xeffff03c 0x800003f8 0xd1d2d3d4 0xe1e2e3e4
0xeffff028: 0xf1f2f3f4 0x91929394 0xa1a2a3a4 0xb1b2b3b4
0xeffff038: 0xc1c2c3c4 0xeffff060
Here's the same hex dump with struct member names added, much like the
output from your test program:
efffee88: efffee98 sigframe.pretcode
efffee8c: 00000011 sigframe.sig
efffee90: 00000008 sigframe.code
efffee94: efffeea4 sigframe.psc
efffee98: 70774e40 sigframe.retcode
efffee9c: f1f2f3f4 sigframe.retcode
efffeea0: 00000000 sigframe.extramask
efffeea4: 00000000 sigframe.sc.mask
efffeea8: effff014 sigframe.sc.sc_usp
efffeeac: 000007a7 sigframe.sc.sc_d0
efffeeb0: 000007a6 sigframe.sc.sc_d1
efffeeb4: 800744ec sigframe.sc.sc_a0
efffeeb8: 8006d2b0 sigframe.sc.sc_a1
efffeebc: 00008000 sigframe.sc.sc_sr, sc_pc
efffeec0: 038cb008 sigframe.sc.sc_pc, sc_formatvec
efffeec4: 7fff0000 sigframe.sc.sc_fpregs
efffeec8: ffffffff sigframe.sc.sc_fpregs
efffeecc: ffffffff sigframe.sc.sc_fpregs
efffeed0: 7fff0000 sigframe.sc.sc_fpregs
efffeed4: ffffffff sigframe.sc.sc_fpregs
efffeed8: ffffffff sigframe.sc.sc_fpregs
efffeedc: 00000000 sigframe.sc.sc_fpcntl
efffeee0: 00000000 sigframe.sc.sc_fpcntl
efffeee4: 00000000 sigframe.sc.sc_fpcntl
efffeee8: 1f386980 sigframe.sc.sc_fpstate
efffeeec: f0fff0ff sigframe.sc.sc_fpstate
efffeef0: 0f800000 sigframe.sc.sc_fpstate
efffeef4: f0ff0000 sigframe.sc.sc_fpstate
efffeef8: 00000000 sigframe.sc.sc_fpstate
efffeefc: 408ece9a sigframe.sc.sc_fpstate
efffef00: 00000000 sigframe.sc.sc_fpstate
efffef04: 00000000 sigframe.sc.sc_fpstate
efffef08: 00000000 sigframe.sc.sc_fpstate
efffef0c: 00000000 sigframe.sc.sc_fpstate
efffef10: 3ffe0000 sigframe.sc.sc_fpstate
efffef14: 00000000 sigframe.sc.sc_fpstate
efffef18: 00000000 sigframe.sc.sc_fpstate
efffef1c: ffffffff sigframe.sc.sc_fpstate
efffef20: 7c0effff sigframe.sc.sc_fpstate
efffef24: 014ee530 sigframe.sc.sc_fpstate
efffef28: 08010000 sigframe.sc.sc_fpstate
efffef2c: 00000004 sigframe.sc.sc_fpstate
efffef30: 00994000 sigframe.sc.sc_fpstate
efffef34: 00995e94 sigframe.sc.sc_fpstate
efffef38: 014ed540 sigframe.sc.sc_fpstate
efffef3c: 00994000 sigframe.sc.sc_fpstate
efffef40: 0040ee24 sigframe.sc.sc_fpstate
efffef44: 08010000 sigframe.sc.sc_fpstate
efffef48: 00000004 sigframe.sc.sc_fpstate
efffef4c: 004fa794 sigframe.sc.sc_fpstate
efffef50: 00995f18 sigframe.sc.sc_fpstate
efffef54: 00995e34 sigframe.sc.sc_fpstate
efffef58: 00079a72 sigframe.sc.sc_fpstate
efffef5c: 004fa794 sigframe.sc.sc_fpstate
efffef60: 000270ce sigframe.sc.sc_fpstate
efffef64: 00995e54 sigframe.sc.sc_fpstate
efffef68: 00027370 sigframe.sc.sc_fpstate
efffef6c: 00000001 sigframe.sc.sc_fpstate
efffef70: 014f1e60 sigframe.sc.sc_fpstate
efffef74: 00995f2c sigframe.sc.sc_fpstate
efffef78: 00994000 sigframe.sc.sc_fpstate
efffef7c: 00000011 sigframe.sc.sc_fpstate
efffef80: 00995f18 sigframe.sc.sc_fpstate
efffef84: 00995e98 sigframe.sc.sc_fpstate
efffef88: 00027d08 sigframe.sc.sc_fpstate
efffef8c: 00000011 sigframe.sc.sc_fpstate
efffef90: 00000004 sigframe.sc.sc_fpstate
efffef94: 00000000 sigframe.sc.sc_fpstate
efffef98: 00995f2c sigframe.sc.sc_fpstate
efffef9c: d1d2d3d4 sigframe.sc.sc_fpstate
efffefa0: e1e2e3e4 sigframe.sc.sc_fpstate
efffefa4: f1f2f3f4 sigframe.sc.sc_fpstate
efffefa8: effffed0 sigframe.sc.sc_fpstate
efffefac: 00000001 sigframe.sc.sc_fpstate
efffefb0: 00000001 sigframe.sc.sc_fpstate
efffefb4: 00995f78 sigframe.sc.sc_fpstate
efffefb8: b1b2b3b4 sigframe.sc.sc_fpstate
efffefbc: c1c2c3c4 sigframe.sc.sc_fpstate
efffefc0: 0eee0709 fmtb.int1, ssw
efffefc4: 24798004 fmtb.isc, isb
efffefc8: efffeffc fmtb.daddr
efffefcc: efffeff8 fmtb.int2
efffefd0: d1d2d3d4 fmtb.dobuf
efffefd4: 48e7383c fmtb.int3
efffefd8: 80000394 fmtb.int3
efffefdc: 80000392 fmtb.baddr
efffefe0: 80000390 fmtb.int4
efffefe4: 000007a7 fmtb.dibuf
efffefe8: 383cff0d fmtb.int5
efffefec: 000ff153 fmtb.int5, ver, int6
efffeff0: 383c2479 fmtb.int7
efffeff4: 8007b4e0 fmtb.int7
efffeff8: 0000383c fmtb.int7
efffeffc: 00000000 fmtb.int7
effff000: * 80100000 fmtb.int7
effff004: * 800003f8 fmtb.int7
effff008: * 000000e0 fmtb.int7
effff00c: * 014ed500 fmtb.int7
effff010: * effffcc4 fmtb.int7
effff014: c1c2c3c4 user stack
effff018: effff03c user stack
effff01c: 800003f8 user stack
effff020: d1d2d3d4 user stack
effff024: e1e2e3e4 user stack
effff028: f1f2f3f4 user stack
effff02c: 91929394 user stack
effff030: a1a2a3a4 user stack
effff034: b1b2b3b4 user stack
effff038: c1c2c3c4 user stack
effff03c: effff060 user stack
The portion marked with an * is the part that damages the user stack, as
these locations are all higher than USP. This just confirms by other means
what your test program already demonstrated.
Good - as I said, I ought to have learned gdb properly when I was young ...
(gdb) c
Continuing.
[ 3334.340000] bus_error030: page fault: pc 8000038c, usp efffe00c, addr efffdffc
[ 3334.360000] bus_error030: page fault: pc 8000038c, usp efffd004, addr efffcffc
[ 3334.380000] bus_error030: page fault: pc 8000038c, usp efffbffc, addr efffbffc
[ 3334.390000] bus_error030: page fault: pc 8000038c, usp efffa010, addr efff9ffc
[ 3334.410000] bus_error030: page fault: pc 8000038c, usp efff9008, addr efff8ffc
[ 3334.430000] bus_error030: page fault: pc 8000038c, usp efff8000, addr efff7ffc
[ 3334.440000] bus_error030: page fault: pc 8000038c, usp efff6014, addr efff5ffc
[ 3334.450000] bus_error030: page fault: pc 8000038c, usp efff500c, addr efff4ffc
[ 3334.460000] bus_error030: page fault: pc 8000038c, usp efff4004, addr efff3ffc
[ 3334.480000] bus_error030: page fault: pc 8000038c, usp efff2ffc, addr efff2ffc
[ 3334.500000] bus_error030: page fault: pc 8000038c, usp efff1010, addr efff0ffc
[ 3334.510000] bus_error030: page fault: pc 8000038c, usp efff0008, addr effefffc
[ 3334.530000] bus_error030: page fault: pc 8000038c, usp effef000, addr effeeffc
[Detaching after fork from child process 167]
Breakpoint 1, 0x800004a4 in handler ()
(gdb) c
Continuing.
Program received signal SIGILL, Illegal instruction.
0x80000492 in rec ()
My test program terminates having detected the corruption visible at
0xeffff000 but gdb allows us to examine that memory afterwards:
(gdb) x/110z 0xeffff014 - 396
0xefffee88: 0xc1c2c3c4 0xefffeeb0 0x800003f8 0xd1d2d3d4
0xefffee98: 0xe1e2e3e4 0xf1f2f3f4 0x91929394 0xa1a2a3a4
0xefffeea8: 0xb1b2b3b4 0xc1c2c3c4 0xefffeed4 0x800003f8
0xefffeeb8: 0xd1d2d3d4 0xe1e2e3e4 0xf1f2f3f4 0x91929394
0xefffeec8: 0xa1a2a3a4 0xb1b2b3b4 0xc1c2c3c4 0xefffeef8
0xefffeed8: 0x800003f8 0xd1d2d3d4 0xe1e2e3e4 0xf1f2f3f4
0xefffeee8: 0x91929394 0xa1a2a3a4 0xb1b2b3b4 0xc1c2c3c4
0xefffeef8: 0xefffef1c 0x800003f8 0xd1d2d3d4 0xe1e2e3e4
0xefffef08: 0xf1f2f3f4 0x91929394 0xa1a2a3a4 0xb1b2b3b4
0xefffef18: 0xc1c2c3c4 0xefffef40 0x800003f8 0xd1d2d3d4
0xefffef28: 0xe1e2e3e4 0xf1f2f3f4 0x91929394 0xa1a2a3a4
0xefffef38: 0xb1b2b3b4 0xc1c2c3c4 0xefffef64 0x800003f8
0xefffef48: 0xd1d2d3d4 0xe1e2e3e4 0xf1f2f3f4 0x91929394
0xefffef58: 0xa1a2a3a4 0xb1b2b3b4 0xc1c2c3c4 0xefffef88
0xefffef68: 0x800003f8 0xd1d2d3d4 0xe1e2e3e4 0xf1f2f3f4
0xefffef78: 0x91929394 0xa1a2a3a4 0xb1b2b3b4 0xc1c2c3c4
0xefffef88: 0xefffefac 0x800003f8 0xd1d2d3d4 0xe1e2e3e4
0xefffef98: 0xf1f2f3f4 0x91929394 0xa1a2a3a4 0xb1b2b3b4
0xefffefa8: 0xc1c2c3c4 0xefffefd0 0x800003f8 0xd1d2d3d4
0xefffefb8: 0xe1e2e3e4 0xf1f2f3f4 0x91929394 0xa1a2a3a4
0xefffefc8: 0xb1b2b3b4 0xc1c2c3c4 0xefffeff4 0x800003f8
0xefffefd8: 0xd1d2d3d4 0xe1e2e3e4 0xf1f2f3f4 0x91929394
0xefffefe8: 0xa1a2a3a4 0xb1b2b3b4 0xc1c2c3c4 0xeffff018
0xefffeff8: 0x800003f8 0xd1d2d3d4 0x80100000 0x800003f8
0xeffff008: 0x000000e0 0x014ed500 0xeffffcc4 0xc1c2c3c4
0xeffff018: 0xeffff03c 0x800003f8 0xd1d2d3d4 0xe1e2e3e4
0xeffff028: 0xf1f2f3f4 0x91929394 0xa1a2a3a4 0xb1b2b3b4
0xeffff038: 0xc1c2c3c4 0xeffff060
(gdb)
So the page that caused the fault never shows any corruption. It turns out
that the MOVEM resumes correctly after the signal is handled, which
overwrites the stack memory used for the signal frame, starting at the top
of the new stack page. I had incorrectly assumed the MOVEM had gone awry
but it actually resumed just fine.
Yes, it's always the page where the movem started on that gets
clobbered, because we use usp to figure our where our signal stack may
end. fmtb.int2 or fmtb.daddr is the correct end location in this case.
But we need to find something that works in the general case (and then
analyze the performance impact it might have in stack and signal heavy
applications - I might have mentioned that before, but your equivalent
to Andreas' patch seemed quite a bit slower in the test case than when
signals were allowed after format b bus faults. Interrupt latency, most
likely).
Cheers,
Michael