Re: [PATCH bpf-next v2 3/6] bpf: x86: Support in-register struct arguments

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 24 Aug 2022 12:05:46 -0700

On Thu, Aug 18, 2022 at 1:44 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Aug 17, 2022 at 09:56:23PM -0700, Yonghong Song wrote:
> >
> >
> > On 8/15/22 3:44 PM, Alexei Starovoitov wrote:
> > > On Thu, Aug 11, 2022 at 10:24 PM Yonghong Song <yhs@xxxxxx> wrote:
> > > >
> > > > In C, struct value can be passed as a function argument.
> > > > For small structs, struct value may be passed in
> > > > one or more registers. For trampoline based bpf programs,
> > > > This would cause complication since one-to-one mapping between
> > > > function argument and arch argument register is not valid
> > > > any more.
> > > >
> > > > To support struct value argument and make bpf programs
> > > > easy to write, the bpf program function parameter is
> > > > changed from struct type to a pointer to struct type.
> > > > The following is a simplified example.
> > > >
> > > > In one of later selftests, we have a bpf_testmod function:
> > > >      struct bpf_testmod_struct_arg_2 {
> > > >          long a;
> > > >          long b;
> > > >      };
> > > >      noinline int
> > > >      bpf_testmod_test_struct_arg_2(int a, struct bpf_testmod_struct_arg_2 b, int c) {
> > > >          bpf_testmod_test_struct_arg_result = a + b.a + b.b + c;
> > > >          return bpf_testmod_test_struct_arg_result;
> > > >      }
> > > >
> > > > When a bpf program is attached to the bpf_testmod function
> > > > bpf_testmod_test_struct_arg_2(), the bpf program may look like
> > > >      SEC("fentry/bpf_testmod_test_struct_arg_2")
> > > >      int BPF_PROG(test_struct_arg_3, int a, struct bpf_testmod_struct_arg_2 *b, int c)
> > > >      {
> > > >          t2_a = a;
> > > >          t2_b_a = b->a;
> > > >          t2_b_b = b->b;
> > > >          t2_c = c;
> > > >          return 0;
> > > >      }
> > > >
> > > > Basically struct value becomes a pointer to the struct.
> > > > The trampoline stack will be increased to store the stack values and
> > > > the pointer to these values will be saved in the stack slot corresponding
> > > > to that argument. For x86_64, the struct size is limited up to 16 bytes
> > > > so the struct can fit in one or two registers. The struct size of more
> > > > than 16 bytes is not supported now as our current use case is
> > > > for sockptr_t in the argument. We could handle such large struct's
> > > > in the future if we have concrete use cases.
> > > >
> > > > The main changes are in save_regs() and restore_regs(). The following
> > > > illustrated the trampoline asm codes for save_regs() and restore_regs().
> > > > save_regs():
> > > >      /* first argument */
> > > >      mov    DWORD PTR [rbp-0x18],edi
> > > >      /* second argument: struct, save actual values and put the pointer to the slot */
> > > >      lea    rax,[rbp-0x40]
> > > >      mov    QWORD PTR [rbp-0x10],rax
> > > >      mov    QWORD PTR [rbp-0x40],rsi
> > > >      mov    QWORD PTR [rbp-0x38],rdx
> > > >      /* third argument */
> > > >      mov    DWORD PTR [rbp-0x8],esi
> > > > restore_regs():
> > > >      mov    edi,DWORD PTR [rbp-0x18]
> > > >      mov    rsi,QWORD PTR [rbp-0x40]
> > > >      mov    rdx,QWORD PTR [rbp-0x38]
> > > >      mov    esi,DWORD PTR [rbp-0x8]
> > >
> > > Not sure whether it was discussed before, but
> > > why cannot we adjust the bpf side instead?
> > > Technically struct passing between bpf progs was never
> > > officially supported. llvm generates something.
> > > Probably always passes by reference, but we can adjust
> > > that behavior without breaking any programs because
> > > we don't have bpf libraries. Programs are fully contained
> > > in one or few files. libbpf can do static linking, but
> > > without any actual libraries the chance of breaking
> > > backward compat is close to zero.
> >
> > Agree. At this point, we don't need to worry about
> > compatibility between bpf program and bpf program libraries.
> >
> > > Can we teach llvm to pass sizeof(struct) <= 16 in
> > > two bpf registers?
> >
> > Yes, we can. I just hacked llvm and was able to
> > do that.
> >
> > > Then we wouldn't need to have a discrepancy between
> > > kernel function prototype and bpf fentry prog proto.
> > > Both will have struct by value in the same spot.
> > > The trampoline generation will be simpler for x86 and
> > > its runtime faster too.
> >
> > I tested x86 and arm64 both supports two registers
> > for a 16 byte struct.
> >
> > > The other architectures that pass small structs by reference
> > > can do a bit more work in the trampoline: copy up to 16 byte
> > > and bpf prog side will see it as they were passed in 'registers'.
> > > wdyt?
> >
> > I know systemz and Hexagon will pass by reference for any
> > struct size >= 8 bytes. Didn't complete check others.
> >
> > But since x86 and arm64 supports direct value passing
> > with two registers, we should be okay. As you mentioned,
> > we could support systemz/hexagon style of struct passing
> > by copying the values to the stack.
> >
> >
> > But I have a problem how to define a user friendly
> > macro like BPF_PROG for user to use.
> >
> > Let us say, we have a program like below:
> > SEC("fentry/bpf_testmod_test_struct_arg_1")
> > int BPF_PROG(test_struct_arg_1, struct bpf_testmod_struct_arg_2 *a, int b,
> > int c) {
> > ...
> > }
> >
> > We have BPF_PROG macro definition here:
> >
> > #define BPF_PROG(name, args...)     \
> > name(unsigned long long *ctx);     \
> > static __always_inline typeof(name(0))     \
> > ____##name(unsigned long long *ctx, ##args);     \
> > typeof(name(0)) name(unsigned long long *ctx)     \
> > {     \
> >         _Pragma("GCC diagnostic push")      \
> >         _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")      \
> >         return ____##name(___bpf_ctx_cast(args));      \
> >         _Pragma("GCC diagnostic pop")      \
> > }     \
> > static __always_inline typeof(name(0))     \
> > ____##name(unsigned long long *ctx, ##args)
> >
> > Some we have static function definition
> >
> > int ____test_struct_arg_1(unsigned long long *ctx, struct
> > bpf_testmod_struct_arg_2 *a, int b, int c);
> >
> > But the function call inside the function test_struct_arg_1()
> > is
> >   ____test_struct_arg_1(ctx, ctx[0], ctx[1], ctx[2]);
> >
> > We have two problems here:
> >   ____test_struct_arg_1(ctx, ctx[0], ctx[1], ctx[2])
> > does not match the static function declaration.
> > This is not problem if everything is int/ptr type.
> > If one of argument is structure type, we will have
> > type conversion problem. Let us this can be resolved
> > somehow through some hack.
> >
> > More importantly, because some structure may take two
> > registers,
> >    ____test_struct_arg_1(ctx, ctx[0], ctx[1], ctx[2])
> > may not be correct. In my above example, if the
> > structure size is 16 bytes,
> > then the actual call should be
> >    ____test_struct_arg_1(ctx, ctx[0], ctx[1], ctx[2], ctx[3])
> > So we need to provide how many extra registers are needed
> > beyond ##args in the macro. I have not tried how to
> > resolve this but this extra information in the macro
> > definite is not user friendly.
> >
> > Not sure what is the best way to handle this issue (##args is not precise
> > and needs addition registers for >8 struct arguments).
>
> The kernel is using this trick to cast 8 byte structs to u64:
> /* cast any integer, pointer, or small struct to u64 */
> #define UINTTYPE(size) \
>         __typeof__(__builtin_choose_expr(size == 1,  (u8)1, \
>                    __builtin_choose_expr(size == 2, (u16)2, \
>                    __builtin_choose_expr(size == 4, (u32)3, \
>                    __builtin_choose_expr(size == 8, (u64)4, \
>                                          (void)5)))))
> #define __CAST_TO_U64(x) ({ \
>         typeof(x) __src = (x); \
>         UINTTYPE(sizeof(x)) __dst; \
>         memcpy(&__dst, &__src, sizeof(__dst)); \
>         (u64)__dst; })
>
> casting 16 byte struct to two u64 can be similar.
> Ideally we would declare bpf prog as:
> SEC("fentry/bpf_testmod_test_struct_arg_1")
> int BPF_PROG(test_struct_arg_1, struct bpf_testmod_struct_arg_2 a, int b, int c) {
> note there is no '*'. It's struct by value.

Agree. So I tried to compile this:

$ git diff

diff --git a/tools/testing/selftests/bpf/progs/test_vmlinux.c
b/tools/testing/selftests/bpf/progs/test_vmlinux.c
index e9dfa0313d1b..dccb9ae9801f 100644
--- a/tools/testing/selftests/bpf/progs/test_vmlinux.c
+++ b/tools/testing/selftests/bpf/progs/test_vmlinux.c
@@ -15,6 +15,16 @@ bool tp_btf_called = false;
 bool kprobe_called = false;
 bool fentry_called = false;

+typedef struct {
+       void *x;
+       int t;
+} sockptr;
+
+static int blah(sockptr x)
+{
+       return x.t;
+}
+
 SEC("tp/syscalls/sys_enter_nanosleep")
 int handle__tp(struct trace_event_raw_sys_enter *args)
 {
@@ -30,7 +40,14 @@ int handle__tp(struct trace_event_raw_sys_enter *args)
                return 0;

        tp_called = true;
-       return 0;
+
+       return blah(({ union {
+               struct { u64 x, y; } z;
+               sockptr s;
+               } tmp = {.z = {0, 1}};
+
+               tmp.s;
+       }));
 }

 SEC("raw_tp/sys_enter")


And it compiled. So I think it's possible to do u64 to struct
conversion using this approach.
We'd have to define two variations of macro -- one for structs <= 8
bytes, another for structs > 8 and <= 16 bytes. One will "consume"
single ctx[] slot, another -- will consume both. And then each variant
knows which other macro to refer to after itself.

A bit of macro hackery, but it should work.


> The main challenge is how to do the math in the BPF_PROG macros.
> Currently it's doing:
> #define ___bpf_ctx_cast1(x)           ___bpf_ctx_cast0(), (void *)ctx[0]
> #define ___bpf_ctx_cast2(x, args...)  ___bpf_ctx_cast1(args), (void *)ctx[1]
> #define ___bpf_ctx_cast3(x, args...)  ___bpf_ctx_cast2(args), (void *)ctx[2]
> #define ___bpf_ctx_cast4(x, args...)  ___bpf_ctx_cast3(args), (void *)ctx[3]
>
> The ctx[index] is one-to-one with argument position.
> That 'index' needs to be calculated.
> Maybe something like UINTTYPE() that applies to previous arguments?
> #define REG_CNT(arg) \
>         __builtin_choose_expr(sizeof(arg) == 1,  1, \
>         __builtin_choose_expr(sizeof(arg) == 2,  1, \
>         __builtin_choose_expr(sizeof(arg) == 4,  1, \
>         __builtin_choose_expr(sizeof(arg) == 8,  1, \
>         __builtin_choose_expr(sizeof(arg) == 16,  2, \
>                                          (void)0)))))
>
> #define ___bpf_reg_cnt0()            0
> #define ___bpf_reg_cnt1(x)          ___bpf_reg_cnt0() + REG_CNT(x)
> #define ___bpf_reg_cnt2(x, args...) ___bpf_reg_cnt1(args) + REG_CNT(x)
> #define ___bpf_reg_cnt(args...)    ___bpf_apply(___bpf_reg_cnt, ___bpf_narg(args))(args)
>
> This way the macro will calculate the index inside ctx[] array.
>
> and then inside ___bpf_ctx_castN macro use ___bpf_reg_cnt.
> Instead of:
> ___bpf_ctx_cast3(x, args...)  ___bpf_ctx_cast2(args), (void *)ctx[2]
> it will be
> ___bpf_ctx_cast3(x, args...)  ___bpf_ctx_cast2(args), \
>   __builtin_choose_expr(sizeof(x) <= 8, (void *)ctx[___bpf_reg_cnt(args)],
>                         *(typeof(x) *) &ctx[___bpf_reg_cnt(args)])
>
> x - is one of the arguments.
> args - all args before 'x'. Doing __bpf_reg_cnt on them should calculate the index.
> *(typeof(x) *)& should type cast to struct of 16 bytes.
>
> Rough idea, of course.
>
> Another alternative is instead of:
> #define BPF_PROG(name, args...)
> name(unsigned long long *ctx);
> do:
> #define BPF_PROG(name, args...)
> struct XX {
>   macro inserts all 'args' here separated by ; so it becomes a proper struct
> };
> name(struct XX *ctx);
>
> and then instead of doing ___bpf_ctx_castN for each argument
> do single cast of all of 'u64 ctx[N]' passed from fentry into 'struct XX *'.
> The problem with this approach that small args like char, short, int needs to
> be declared in struct XX with __align__(8).
>
> Both approaches may be workable?