Re: [PATCH bpf-next 22/24] s390/bpf: Implement arch_prepare_bpf_trampoline()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 25 Jan 2023 17:15:52 -0800

On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote:
>
> arch_prepare_bpf_trampoline() is used for direct attachment of eBPF
> programs to various places, bypassing kprobes. It's responsible for
> calling a number of eBPF programs before, instead and/or after
> whatever they are attached to.
>
> Add a s390x implementation, paying attention to the following:
>
> - Reuse the existing JIT infrastructure, where possible.
> - Like the existing JIT, prefer making multiple passes instead of
>   backpatching. Currently 2 passes is enough. If literal pool is
>   introduced, this needs to be raised to 3. However, at the moment
>   adding literal pool only makes the code larger. If branch
>   shortening is introduced, the number of passes needs to be
>   increased even further.
> - Support both regular and ftrace calling conventions, depending on
>   the trampoline flags.
> - Use expolines for indirect calls.
> - Handle the mismatch between the eBPF and the s390x ABIs.
> - Sign-extend fmod_ret return values.
>
> invoke_bpf_prog() produces about 120 bytes; it might be possible to
> slightly optimize this, but reaching 50 bytes, like on x86_64, looks
> unrealistic: just loading cookie, __bpf_prog_enter, bpf_func, insnsi
> and __bpf_prog_exit as literals already takes at least 5 * 12 = 60
> bytes, and we can't use relative addressing for most of them.
> Therefore, lower BPF_MAX_TRAMP_LINKS on s390x.
>
> Signed-off-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> ---
>  arch/s390/net/bpf_jit_comp.c | 535 +++++++++++++++++++++++++++++++++--
>  include/linux/bpf.h          |   4 +
>  2 files changed, 517 insertions(+), 22 deletions(-)
>

[...]

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index cf89504c8dda..52ff43bbf996 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -943,7 +943,11 @@ struct btf_func_model {
>  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
>   * bytes on x86.
>   */
> +#if defined(__s390x__)
> +#define BPF_MAX_TRAMP_LINKS 27
> +#else
>  #define BPF_MAX_TRAMP_LINKS 38
> +#endif

if we turn this into enum definition, then on selftests side we can
just discover this from vmlinux BTF, instead of hard-coding
arch-specific constants. Thoughts?

>
>  struct bpf_tramp_links {
>         struct bpf_tramp_link *links[BPF_MAX_TRAMP_LINKS];
> --
> 2.39.1
>