Re: Help with verifier failure

Brendan Jackman <jackmanb@xxxxxxxxxx> · Thu, 22 Apr 2021 15:55:22 +0200

On Wed, 21 Apr 2021 at 18:59, Yonghong Song <yhs@xxxxxx> wrote:
> On 4/21/21 8:06 AM, Yonghong Song wrote:
> > On 4/21/21 5:23 AM, Brendan Jackman wrote:
> > Thanks, Brendan. Looks at least the verifier failure is triggered
> > by recent clang changes. I will take a look whether we could
> > improve verifier for such a case and whether we could improve
> > clang to avoid generate such codes the verifier doesn't like.
> > Will get back to you once I had concrete analysis.
> >
> >>
> >> This seems like it must be a common pitfall, any idea what we can do
> >> to fix it
> >> and avoid it in future? Am I misunderstanding the issue?
>
> First, for the example code you provided, I checked with llvm11, llvm12
> and latest trunk llvm (llvm13-dev) and they all generated similar codes,
> which may trigger verifier failure. Somehow you original code could be
> different may only show up with a recent llvm, I guess.
>
> Checking llvm IR, the divergence between "w2 = w8" and "if r8 < 0x1000"
> appears in insn scheduling phase related handling PHIs. Need to further
> check whether it is possible to prevent the compiler from generating
> such codes.
>
> The latest kernel already had the ability to track register equivalence.
> However, the tracking is conservative for 32bit mov like "w2 = w8" as
> you described in the above. if we have code like "r2 = r8; if r8 <
> 0x1000 ...", we will be all good.
>
> The following hack fixed the issue,
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 58730872f7e5..54f418fd6a4a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7728,12 +7728,20 @@ static int check_alu_op(struct bpf_verifier_env
> *env, struct bpf_insn *insn)
>                                                  insn->src_reg);
>                                          return -EACCES;
>                                  } else if (src_reg->type == SCALAR_VALUE) {
> +                                       /* If src_reg is in 32bit range,
> there is
> +                                        * no need to reset the ID.
> +                                        */
> +                                       bool is_32bit_src =
> src_reg->umax_value <= 0x7fffffff;
> +
> +                                       if (is_32bit_src && !src_reg->id)
> +                                               src_reg->id = ++env->id_gen;
>                                          *dst_reg = *src_reg;
>                                          /* Make sure ID is cleared
> otherwise
>                                           * dst_reg min/max could be
> incorrectly
>                                           * propagated into src_reg by
> find_equal_scalars()
>                                           */
> -                                       dst_reg->id = 0;
> +                                       if (!is_32bit_src)
> +                                               dst_reg->id = 0;
>                                          dst_reg->live |= REG_LIVE_WRITTEN;
>                                          dst_reg->subreg_def =
> env->insn_idx + 1;
>                                  } else {
>
> Basically, for a 32bit mov insn like "w2 = w8", if we can ensure
> that "w8" is 32bit and has no possibility that upper 32bit is set
> for r8, we can declare them equivalent. This fixed your issue.
>
> Will try to submit a formal patch later.

Ah.. I did not realise this equivalence tracking with reg.id was there
for scalar values! I also didn't take any notice of the use of 32-bit
operations in the assembly, thanks for pointing that out.

Yes it sounds like this is certainly worth fixing in the kernel - even
if Clang stops generating the code today it will probably start doing
so again in the future. I can also help with the verifier work if
needed.