Kevin D. Kissell wrote:
Well, he's *almost* right about that. The delay slot emulation function
executes a single instruction off the user stack/vdso slot, which is
followed in memory by an instruction that provokes an address
exception. The address exception handler detects the special case (and
it should be noted that detecting the special case could be made simpler
and more reliable if a vdso-type region were used),
Ralf recently changed this to a 'break' instruction, but the logic
remains the same.
cleans up, and
restores normal stack behavior. That "clean up" could, of course,
include any necessary vdso slot management. But what about cases that
won't get to the magic alignment trap?
As the instruction being executed is extracted from a branch delay slot,
we know it's not legal for it to be any sort of branch or jump
instruction.
These we would detect and since the behavior is 'UNPREDICTABLE' we can
treat them as a nop and remain within the specified behavior.
But it *could* be a trap or system call instruction, or a
load/store that would provoke a TLB exception. In the usual cases,
however, as I believe David was alluding, either the exception will
ultimately unwind to return to execute the magic alignment trap, or the
thread will exit, and could free the emulation slot as part of general
cleanup.
But there's a case that isn't handled in this model, and that's the case
of an exception (or interrupt that falls in the 2-instruction window)
resulting in a signal that is caught and dispatched, and where either
the signal handler does a longjmp and restarts FP computation, or where
the signal handler itself contains a FP branch with yet another delay
slot to be emulated. One *could* get alarm signal before the original
delay slot instruction is executed, so recycling the same vdso cache
line would be premature. It's hard to get away from something
distinctly stack-like if one wants to cover these cases.
System calls we don't have to handle, they will eventually return to the
break instruction following the delay slot instruction and be handled by
the normal processing.
I am thinking that all other exceptions will result in one of three cases:
1) They will work like system calls and return to the 'break'.
2) The thread will exit.
3) They result in a signal being sent to the thread. We can handle it
in force_signal(). In this case we would adjust the eip to point at the
original location of the instruction and clean things up. If the
signal handler tries to restart the instruction, the FP emulator will
re-run the emulation.
My short-term suggestion would be to leave FP emulator delay slot
handling on the (executable) user stack, even if signal trampolines use
the vdso.
They are really two seperate (but related) problems. If we want
eXecute-Inhibit for the stack we need to solve it.
Longer term, we might consider what sorts of crockery would
be necessary to deal with delay slot abandonment and recursion. That
might mean adding cruft to the signal dispatch logic to detect that
we're in mid-delay-slot-emulation and defer the signal until after the
alignment trap cleanup is done (adds annoying run-time overhead, but is
probably the smallest increase in footprint and complexity), or it might
mean changing the delay slot emulation paradigm completely and bolting a
full instruction set emulator into the FP emulator, so that the delay
slot instruction is simulated in kernel mode, rather than requiring
execution in user mode. I rejected that idea out-of-hand when I first
did the FP emulator integration with the kernel, years ago, but maybe
the constraints have changed...
I think full instruction set emulation is not so easy. How would you
emulate COP2 instructions?
David Daney