Re: Help with verifier failure

Yonghong Song <yhs@xxxxxx> · Thu, 22 Apr 2021 07:35:08 -0700

On 4/22/21 6:55 AM, Brendan Jackman wrote:
On Wed, 21 Apr 2021 at 18:59, Yonghong Song <yhs@xxxxxx> wrote:
On 4/21/21 8:06 AM, Yonghong Song wrote:
On 4/21/21 5:23 AM, Brendan Jackman wrote:
Thanks, Brendan. Looks at least the verifier failure is triggered
by recent clang changes. I will take a look whether we could
improve verifier for such a case and whether we could improve
clang to avoid generate such codes the verifier doesn't like.
Will get back to you once I had concrete analysis.


This seems like it must be a common pitfall, any idea what we can do
to fix it
and avoid it in future? Am I misunderstanding the issue?

First, for the example code you provided, I checked with llvm11, llvm12
and latest trunk llvm (llvm13-dev) and they all generated similar codes,
which may trigger verifier failure. Somehow you original code could be
different may only show up with a recent llvm, I guess.

Checking llvm IR, the divergence between "w2 = w8" and "if r8 < 0x1000"
appears in insn scheduling phase related handling PHIs. Need to further
check whether it is possible to prevent the compiler from generating
such codes.

The latest kernel already had the ability to track register equivalence.
However, the tracking is conservative for 32bit mov like "w2 = w8" as
you described in the above. if we have code like "r2 = r8; if r8 <
0x1000 ...", we will be all good.

The following hack fixed the issue,

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 58730872f7e5..54f418fd6a4a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7728,12 +7728,20 @@ static int check_alu_op(struct bpf_verifier_env
*env, struct bpf_insn *insn)
                                                  insn->src_reg);
                                          return -EACCES;
                                  } else if (src_reg->type == SCALAR_VALUE) {
+                                       /* If src_reg is in 32bit range,
there is
+                                        * no need to reset the ID.
+                                        */
+                                       bool is_32bit_src =
src_reg->umax_value <= 0x7fffffff;
+
+                                       if (is_32bit_src && !src_reg->id)
+                                               src_reg->id = ++env->id_gen;
                                          *dst_reg = *src_reg;
                                          /* Make sure ID is cleared
otherwise
                                           * dst_reg min/max could be
incorrectly
                                           * propagated into src_reg by
find_equal_scalars()
                                           */
-                                       dst_reg->id = 0;
+                                       if (!is_32bit_src)
+                                               dst_reg->id = 0;
                                          dst_reg->live |= REG_LIVE_WRITTEN;
                                          dst_reg->subreg_def =
env->insn_idx + 1;
                                  } else {

Basically, for a 32bit mov insn like "w2 = w8", if we can ensure
that "w8" is 32bit and has no possibility that upper 32bit is set
for r8, we can declare them equivalent. This fixed your issue.

Will try to submit a formal patch later.

Ah.. I did not realise this equivalence tracking with reg.id was there
for scalar values! I also didn't take any notice of the use of 32-bit
operations in the assembly, thanks for pointing that out.

Yes it sounds like this is certainly worth fixing in the kernel - even
if Clang stops generating the code today it will probably start doing
so again in the future. I can also help with the verifier work if
needed.

I won't have time for this in the next few days.
Considering the current upstream merge window is close, yes, please
go head to work on this. Thanks!