> This is quite clever, but now I’m wondering just how much kernel help > is really needed. In your series, the trampoline is an non-executable > page. I can think of at least two alternative approaches, and I'd > like to know the pros and cons. > > 1. Entirely userspace: a return trampoline would be something like: > > 1: > pushq %rax > pushq %rbc > pushq %rcx > ... > pushq %r15 > movq %rsp, %rdi # pointer to saved regs > leaq 1b(%rip), %rsi # pointer to the trampoline itself > callq trampoline_handler # see below For nested calls (where the trampoline needs to pass the original stack frame to the nested function) I think you just need a page full of: mov $0, scratch_reg; jmp trampoline_handler mov $1, scratch_reg; jmp trampoline_handler You need an unused register, on x86-64 I think both r10 and r11 are available. On i386 I think eax can be used. It might even be that the first argument register is available - if that is used to pass in the stack frame. The trampoline_handler then uses the passed in value to index an array of stack frame and function pointers and jumps to the real function. You need to hold everything in __thread data. And maybe be able to allocate an extra page for deeply nested code paths (eg recursive nested functions). You might then need a driver to create you a suitable executable page. Somehow you need to pass in the address of the trampoline_handler and the number for the first fault. It need to pass back the 'stride' of the array and number of elements created. But if you can take the cost of the page fault, then you can interpret the existing trampoline in userspace within the signal handler. This is two kernel entry/exits. Arbitrary JIT is a different problem entirely. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)