On Mon, May 9, 2022 at 6:43 PM Kui-Feng Lee <kuifeng@xxxxxx> wrote: > > On Tue, 2022-05-10 at 01:29 +0000, Kui-Feng Lee wrote: > > On Mon, 2022-05-09 at 14:04 -0700, Alexei Starovoitov wrote: > > > On Sat, May 07, 2022 at 08:21:14PM -0700, Kui-Feng Lee wrote: > > > > > > > > + /* Prepare struct bpf_tramp_run_ctx. > > > > + * sub rsp, sizeof(struct bpf_tramp_run_ctx) > > > > + */ > > > > + EMIT4(0x48, 0x83, 0xEC, sizeof(struct > > > > bpf_tramp_run_ctx)); > > > > + > > > > if (fentry->nr_links) > > > > if (invoke_bpf(m, &prog, fentry, regs_off, > > > > flags & > > > > BPF_TRAMP_F_RET_FENTRY_RET)) > > > > @@ -2098,6 +2121,11 @@ int arch_prepare_bpf_trampoline(struct > > > > bpf_tramp_image *im, void *image, void *i > > > > } > > > > > > > > if (flags & BPF_TRAMP_F_CALL_ORIG) { > > > > + /* pop struct bpf_tramp_run_ctx > > > > + * add rsp, sizeof(struct bpf_tramp_run_ctx) > > > > + */ > > > > + EMIT4(0x48, 0x83, 0xC4, sizeof(struct > > > > bpf_tramp_run_ctx)); > > > > + > > > > restore_regs(m, &prog, nr_args, regs_off); > > > > > > > > /* call original function */ > > > > @@ -2110,6 +2138,11 @@ int arch_prepare_bpf_trampoline(struct > > > > bpf_tramp_image *im, void *image, void *i > > > > im->ip_after_call = prog; > > > > memcpy(prog, x86_nops[5], X86_PATCH_SIZE); > > > > prog += X86_PATCH_SIZE; > > > > + > > > > + /* Prepare struct bpf_tramp_run_ctx. > > > > + * sub rsp, sizeof(struct bpf_tramp_run_ctx) > > > > + */ > > > > + EMIT4(0x48, 0x83, 0xEC, sizeof(struct > > > > bpf_tramp_run_ctx)); > > > > } > > > > > > > > if (fmod_ret->nr_links) { > > > > @@ -2133,6 +2166,11 @@ int arch_prepare_bpf_trampoline(struct > > > > bpf_tramp_image *im, void *image, void *i > > > > goto cleanup; > > > > } > > > > > > > > + /* pop struct bpf_tramp_run_ctx > > > > + * add rsp, sizeof(struct bpf_tramp_run_ctx) > > > > + */ > > > > + EMIT4(0x48, 0x83, 0xC4, sizeof(struct > > > > bpf_tramp_run_ctx)); > > > > + > > > > > > What is the point of all of these additional sub/add rsp ? > > > It seems unconditionally increasing stack_size by sizeof(struct > > > bpf_tramp_run_ctx) > > > will achieve the same and above 4 extra insns won't be needed. > > > > I think you are right. > > > > The reason that I don't change stack_size is that we access arguments > or saved registers basing on stack_size. Once the stack_size is > changed, all these offsets should be changed too. That should be trivial. keep regs_off = stack_size; and increase stack_size right after. or some other math. Maybe worth introducing another _off in addition to int regs_off, ip_off, args_off. Definitely update 'Generated trampoline stack layout' comment and explain where bpf_tramp_run_ctx is in relation to regs_off. Maybe keeping regs_off (without new _off) is enough.