Re: [PATCH v2 bpf-next 1/8] bpf: Add gen_epilogue to bpf_verifier_ops

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 21 Aug 2024 17:22:59 -0700

On Wed, Aug 21, 2024 at 4:35 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> From: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
>
> This patch adds a .gen_epilogue to the bpf_verifier_ops. It is similar
> to the existing .gen_prologue. Instead of allowing a subsystem
> to run code at the beginning of a bpf prog, it allows the subsystem
> to run code just before the bpf prog exit.
>
> One of the use case is to allow the upcoming bpf qdisc to ensure that
> the skb->dev is the same as the qdisc->dev_queue->dev. The bpf qdisc
> struct_ops implementation could either fix it up or drop the skb.
> Another use case could be in bpf_tcp_ca.c to enforce snd_cwnd
> has sane value (e.g. non zero).
>
> The epilogue can do the useful thing (like checking skb->dev) if it
> can access the bpf prog's ctx. Unlike prologue, r1 may not hold the
> ctx pointer. This patch saves the r1 in the stack if the .gen_epilogue
> has returned some instructions in the "epilogue_buf".
>
> The existing .gen_prologue is done in convert_ctx_accesses().
> The new .gen_epilogue is done in the convert_ctx_accesses() also.
> When it sees the (BPF_JMP | BPF_EXIT) instruction, it will be patched
> with the earlier generated "epilogue_buf". The epilogue patching is
> only done for the main prog.
>
> Reviewed-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> Signed-off-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
> ---
>  include/linux/bpf.h   |  2 ++
>  kernel/bpf/verifier.c | 34 ++++++++++++++++++++++++++++++++--
>  2 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f0192c173ed8..8ee9d87c332a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -974,6 +974,8 @@ struct bpf_verifier_ops {
>                                 struct bpf_insn_access_aux *info);
>         int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
>                             const struct bpf_prog *prog);
> +       int (*gen_epilogue)(struct bpf_insn *insn, const struct bpf_prog *prog,
> +                           s16 ctx_stack_off);
>         int (*gen_ld_abs)(const struct bpf_insn *orig,
>                           struct bpf_insn *insn_buf);
>         u32 (*convert_ctx_access)(enum bpf_access_type type,
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index df3be12096cf..bbb655f0c7b5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19610,15 +19610,37 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env,
>   */
>  static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  {
> +       struct bpf_subprog_info *subprogs = env->subprog_info;
>         const struct bpf_verifier_ops *ops = env->ops;
> -       int i, cnt, size, ctx_field_size, delta = 0;
> +       int i, cnt, size, ctx_field_size, delta = 0, epilogue_cnt = 0;
>         const int insn_cnt = env->prog->len;
> -       struct bpf_insn insn_buf[16], *insn;
> +       struct bpf_insn insn_buf[16], epilogue_buf[16], *insn;

This is a noticeable stack increase.
Maybe let's move both to env?

>         u32 target_size, size_default, off;
>         struct bpf_prog *new_prog;
>         enum bpf_access_type type;
>         bool is_narrower_load;
>
> +       if (ops->gen_epilogue) {
> +               epilogue_cnt = ops->gen_epilogue(epilogue_buf, env->prog,
> +                                                -(subprogs[0].stack_depth + 8));
> +               if (epilogue_cnt >= ARRAY_SIZE(epilogue_buf)) {
> +                       verbose(env, "bpf verifier is misconfigured\n");
> +                       return -EINVAL;
> +               } else if (epilogue_cnt) {
> +                       /* Save the ARG_PTR_TO_CTX for the epilogue to use */
> +                       cnt = 0;
> +                       subprogs[0].stack_depth += 8;
> +                       insn_buf[cnt++] = BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_1,
> +                                                     -subprogs[0].stack_depth);
> +                       insn_buf[cnt++] = env->prog->insnsi[0];
> +                       new_prog = bpf_patch_insn_data(env, 0, insn_buf, cnt);
> +                       if (!new_prog)
> +                               return -ENOMEM;
> +                       env->prog = new_prog;
> +                       delta += cnt - 1;

I suspect this is buggy.
See commit 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and
jump to the 1st insn.")

> +               }
> +       }
> +
>         if (ops->gen_prologue || env->seen_direct_write) {
>                 if (!ops->gen_prologue) {
>                         verbose(env, "bpf verifier is misconfigured\n");
> @@ -19671,6 +19693,13 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                         insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code);
>                         env->prog->aux->num_exentries++;
>                         continue;
> +               } else if (insn->code == (BPF_JMP | BPF_EXIT) &&
> +                          epilogue_cnt &&
> +                          i + delta < subprogs[1].start) {
> +                       /* Generate epilogue for the main prog */
> +                       memcpy(insn_buf, epilogue_buf, sizeof(epilogue_buf));
> +                       cnt = epilogue_cnt;
> +                       goto patch_insn_buf;

That's quite a bit of copy paste of epilogue if the program contains
multiple bpf_exit insns.
I think llvm generates it in such a way that often there is only
one bpf_exit. But gcc might be using bpf_exit liberally.

So let's patch it only once and replace other bpf_exit with jmp
to one patched place ?

That's pretty much what x86 JIT does when it converts bpf_exit.

>                 } else {
>                         continue;
>                 }
> @@ -19807,6 +19836,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                                                        insn->dst_reg, insn->dst_reg,
>                                                        size * 8, 0);
>
> +patch_insn_buf:
>                 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
>                 if (!new_prog)
>                         return -ENOMEM;
> --
> 2.43.5
>