Brian Foster wrote:
It's case 2 (above), the trampoline that has “something
to do with FPU emulation”, which has me concerned ATM.
The 4KSd core does not have an FPU. That encourages the
use of ‘-msoft-float’ (at least for performance), but does
not require it. (Albeit I wonder if, in the restricted
world I'm playing in, if it could be “required” (assuming
it doesn't have an issue?)? Hum .... .)
The use of -msoft-float historically required (and as far as I know
still requires)
a completely different ground-up userland build, so it gets used less
than you
might think.
The quick summary (which I'm sure others on this list can
clarify/correct) is the FP trampoline, which is pushed on
the user-land stack is, unlike sigreturn, not fixed code.
It varies on a per-instance per-thread basis. Hence the
simple ‘vsyscall’ mechanism ((to be?) used for sigreturn)
is inappropriate.
The trampoline is only used to execute a non-FP instruction
(<instr>) in the delay slot of an FP-instruction:
<instr> # Non-FP instruction to execute in user-land
BADINST # Bad instruction forcing return to FP emulator
COOKIE # Bad instruction (not executed) for verification
<epc> # Where to resume execution after <instr>
Belch! ;-\ Whilst I can think of a few things that may work
(temporarily change page permissions; or go ahead and use
the ‘vsyscall’ page with some interlocking magic; or a new
new dedicated per-thread page; or ...?) none seem appealing.
Suggestions? Comments? Prior art to study?
As the jerk who originally bolted the FP emulator into the MIPS kernel
and came up with the stack trampoline hack, I can explain why it seemed
sane at the time. If an FP branch is emulated and to be taken, we have to
find a way for the instruction in the delay slot to be executed prior to the
transfer of control to the branch target. It has to execute with the user's
permissions. Putting it on the user's stack and building a trampoline was
the fairly classical way of doing it, but note that it's architecturally
illegal
to put a branch in a branch delay slot (floating point or otherwise), so
there's no possibility of recursion. So one only needs 3-4 words (one
could substitute another means of validation for the cookie) per
thread. It just has to be part of the user's address space. I suppose
that instead of using a few words just above the stack, one could use
a few words just below the current "brk()" point, or, better still (but
far more invasive) pad the text segment, which should always be
executable, with 4 words that the kernel can find in a hurry.
Regards,
Kevin K.