Re: [PATCH 1/2] MIPS: Preliminary vdso.

David Daney <ddaney@xxxxxxxxxxxxxxxxxx> · Mon, 27 Apr 2009 08:54:46 -0700

Kevin D. Kissell wrote:

Well, he's *almost* right about that. The delay slot emulation function 
executes a single instruction off the user stack/vdso slot, which is 
followed in memory by an instruction that provokes an address 
exception.  The address exception handler detects the special case (and 
it should be noted that detecting the special case could be made simpler 
and more reliable if a vdso-type region were used),

Ralf recently changed this to a 'break' instruction, but the logic 
remains the same.

cleans up, and 
restores normal stack behavior.  That "clean up" could, of course, 
include any necessary vdso slot management.  But what about cases that 
won't get to the magic alignment trap?

As the instruction being executed is extracted from a branch delay slot, 
we know it's not legal for it to be any sort of branch or jump 
instruction.

These we would detect and since the behavior is 'UNPREDICTABLE' we can 
treat them as a nop and remain within the specified behavior.

 But it *could* be a trap or system call instruction, or a 
load/store that would provoke a TLB exception.  In the usual cases, 
however, as I believe David was alluding, either the exception will 
ultimately unwind to return to execute the magic alignment trap, or the 
thread will exit, and could free the emulation slot as part of general 
cleanup.

But there's a case that isn't handled in this model, and that's the case 
of an exception (or interrupt that falls in the 2-instruction window) 
resulting in a signal that is caught and dispatched, and where either 
the signal handler does a longjmp and restarts FP computation, or where 
the signal handler itself contains a FP branch with yet another delay 
slot to be emulated. One *could* get alarm signal before the original 
delay slot instruction is executed, so recycling the same vdso cache 
line would be premature.  It's hard to get away from something 
distinctly stack-like if one wants to cover these cases.

System calls we don't have to handle, they will eventually return to the 
break instruction following the delay slot instruction and be handled by 
the normal processing.

I am thinking that all other exceptions will result in one of three cases:

1) They will work like system calls and return to the 'break'.

2) The thread will exit.

3) They result in a signal being sent to the thread.  We can handle it 
in force_signal().  In this case we would adjust the eip to point at the 
 original location of the instruction and clean things up.  If the 
signal handler tries to restart the instruction, the FP emulator will 
re-run the emulation.

My short-term suggestion would be to leave FP emulator delay slot 
handling on the (executable) user stack, even if signal trampolines use 
the vdso.

They are really two seperate (but related) problems.  If we want 
eXecute-Inhibit for the stack we need to solve it.

Longer term, we might consider what sorts of crockery would 
be necessary to deal with delay slot abandonment and recursion.  That 
might mean adding cruft to the signal dispatch logic to detect that 
we're in mid-delay-slot-emulation and defer the signal until after the 
alignment trap cleanup is done (adds annoying run-time overhead, but is 
probably the smallest increase in footprint and complexity), or it might 
mean changing the delay slot emulation paradigm completely and bolting a 
full instruction set emulator into the FP emulator, so that the delay 
slot instruction is simulated in kernel mode, rather than requiring 
execution in user mode.  I rejected that idea out-of-hand when I first 
did the FP emulator integration with the kernel, years ago, but maybe 
the constraints have changed...

I think full instruction set emulation is not so easy.  How would you 
emulate COP2 instructions?

David Daney