Re: [PATCH v2 0/4] [RFC] Implement Trampoline File Descriptor

Florian Weimer <fw@xxxxxxxxxxxxx> · Thu, 24 Sep 2020 22:52:38 +0200

* Madhavan T. Venkataraman:

> Otherwise, using an ABI quirk or a calling convention side effect to
> load the PC into a GPR is, IMO, non-standard or non-compliant or
> non-approved or whatever you want to call it. I would be
> conservative and not use it. Who knows what incompatibility there
> will be with some future software or hardware features?

AArch64 PAC makes a backwards-incompatible change that touches this
area, but we'll see if they can actually get away with it.

In general, these things are baked into the ABI, even if they are not
spelled out explicitly in the psABI supplement.

> For instance, in the i386 example, we do a call without a matching return.
> Also, we use a pop to undo the call. Can anyone tell me if this kind of use
> is an ABI approved one?

Yes, for i386, this is completely valid from an ABI point of view.
It's equally possible to use a regular function call and just read the
return address that has been pushed to the stack.  Then there's no
stack mismatch at all.  Return stack predictors (including the one
used by SHSTK) also recognize the CALL 0 construct, so that's fine as
well.  The i386 psABI does not use function descriptors, and either
approach (out-of-line thunk or CALL 0) is in common use to materialize
the program counter in a register and construct the GOT pointer.

> If the kernel supplies this, then all applications and libraries can use
> it for all architectures with one single, simple API. Without this, each
> application/library has to roll its own solution for every architecture-ABI
> combo it wants to support.

Is there any other user for these type-generic trampolines?
Everything else I've seen generates machine code specific to the
function being called.  libffi is quite the outlier in my experience
because the trampoline calls a generic data-driven
marshaller/unmarshaller.  The other trampoline generators put this
marshalling code directly into the generated trampoline.

I'm still not convinced that this can't be done directly in libffi,
without kernel help.  Hiding the architecture-specific code in the
kernel doesn't reduce overall system complexity.

> As an example, in libffi:
>
> 	ffi_closure_alloc() would call alloc_tramp()
>
> 	ffi_prep_closure_loc() would call init_tramp()
>
> 	ffi_closure_free() would call free_tramp()
>
> That is it! It works on all the architectures supported in the kernel for
> trampfd.

ffi_prep_closure_loc would still need to check whether the trampoline
has been allocated by alloc_tramp because some applications supply
their own (executable and writable) mapping.  ffi_closure_alloc would
need to support different sizes (not matching the trampoline).  It's
also unclear to me to what extent software out there writes to the
trampoline data directly, bypassing the libffi API (the structs are
not opaque, after all).  And all the existing libffi memory management
code (including the embedded dlmalloc copy) would be needed to support
kernels without trampfd for years to come.

I very much agree that we have a gap in libffi when it comes to
JIT-less operation.  But I'm not convinced that kernel support is
needed to close it, or that it is even the right design.