On 8/3/20 3:08 AM, David Laight wrote: > From: Pavel Machek <pavel@xxxxxx> >> Sent: 02 August 2020 12:56 >> Hi! >> >>>> This is quite clever, but now I???m wondering just how much kernel help >>>> is really needed. In your series, the trampoline is an non-executable >>>> page. I can think of at least two alternative approaches, and I'd >>>> like to know the pros and cons. >>>> >>>> 1. Entirely userspace: a return trampoline would be something like: >>>> >>>> 1: >>>> pushq %rax >>>> pushq %rbc >>>> pushq %rcx >>>> ... >>>> pushq %r15 >>>> movq %rsp, %rdi # pointer to saved regs >>>> leaq 1b(%rip), %rsi # pointer to the trampoline itself >>>> callq trampoline_handler # see below >>> For nested calls (where the trampoline needs to pass the >>> original stack frame to the nested function) I think you >>> just need a page full of: >>> mov $0, scratch_reg; jmp trampoline_handler >> I believe you could do with mov %pc, scratch_reg; jmp ... >> >> That has advantage of being able to share single physical >> page across multiple virtual pages... > A lot of architecture don't let you copy %pc that way so you would > have to use 'call' - but that trashes the return address cache. > It also needs the trampoline handler to know the addresses > of the trampolines. Do you which ones don't allow you to copy %pc? Some of the architctures do not have PC-relative data references. If they do not allow you to copy the PC into a general purpose register, then there is no way to implement the statically defined trampoline that has been discussed so far. In these cases, the trampoline has to be generate at runtime. Thanks. Madhavan