Re: [RFC PATCH 2/4] bpf: verifier, do explicit u32 bounds tracking

Yonghong Song <yhs@xxxxxx> · Tue, 10 Mar 2020 10:52:29 -0700






On 3/6/20 4:11 PM, John Fastabend wrote:
It is not possible for the current verifier to track u32 alu ops and jmps
correctly. This can result in the verifier aborting with errors even though
the program should be verifiable. Cilium code base has hit this but worked
around it by changing int variables to u64 variables and marking a few
things volatile. It would be better to avoid these tricks.

But, the main reason to address this now is do_refine_retval_range() was
assuming return values could not be negative. Once we fix this in the
next patches code that was previously working will no longer work.
See do_refine_retval_range() patch for details.

The simplest example code snippet that illustrates the problem is likelyy
this,

  53: w8 = w0                    // r8 <- [0, S32_MAX],
                                 // w8 <- [-S32_MIN, X]
  54: w8 <s 0                    // r8 <- [0, U32_MAX]
                                 // w8 <- [0, X]

The expected 64-bit and 32-bit bounds after each line are shown on the
right. The current issue is without the w* bounds we are forced to use
the worst case bound of [0, U32_MAX]. To resolve this type of case,
jmp32 creating divergent 32-bit bounds from 64-bit bounds, we add explicit
32-bit register bounds s32_{min|max}_value, u32_{min|max}_value, and
var32_off. Then from branch_taken logic creating new bounds we can
track 32-bit bounds explicitly.

The next case we observed is ALU ops after the jmp32,

  53: w8 = w0                    // r8 <- [0, S32_MAX],
                                 // w8 <- [-S32_MIN, X]
  54: w8 <s 0                    // r8 <- [0, U32_MAX]
                                 // w8 <- [0, X]
  55: w8 += 1                    // r8 <- [0, U32_MAX+1]
                                 // w8 <- [0, X+1]

In order to keep the bounds accurate at this point we also need to track
ALU32 ops. To do this we add explicit alu32 logic for each of the alu
ops, mov, add, sub, etc.

Finally there is a question of how and when to merge bounds. The cases
enumerate here,

1. MOV ALU32   - zext 32-bit -> 64-bit
2. MOV ALU64   - copy 64-bit -> 32-bit
3. op  ALU32   - zext 32-bit -> 64-bit
4. op  ALU64   - n/a
5. jmp ALU32   - 64-bit: var32_off | var64_off
6. jmp ALU64   - 32-bit: (>> (<< var64_off))

Details for each case,

For "MOV ALU32" BPF arch zero extends so we simply copy the bounds
from 32-bit into 64-bit ensuring we cast the var32_off. See zext_32_to_64.

For "MOV ALU64" copy all bounds including 32-bit into new register. If
the src register had 32-bit bounds the dst register will as well.

For "op ALU32" zero extend 32-bit into 64-bit, see zext_32_to_64.

For "op ALU64" calculate both 32-bit and 64-bit bounds no merging
is done here. Except we have a special case. When RSH or ARSH is
done we can't simply ignore shifting bits from 64-bit reg into the
32-bit subreg. So currently just push bounds from 64-bit into 32-bit.
This will be correct in the sense that they will represent a valid
state of the register. However we could lose some accuracy if an
ARSH is following a jmp32 operation. We can handle this special
case in a follow up series.

For "jmp ALU32" mark 64-bit reg unknown and recalculate 64-bit bounds
from tnum by setting var_off to ((<<(>>var_off)) | var32_off). We
special case if 64-bit bounds has zero'd upper 32bits at which point
wee can simply copy 32-bit bounds into 64-bit register. This catches
a common compiler trick where upper 32-bits are zeroed and then
32-bit ops are used followed by a 64-bit compare or 64-bit op on
a pointer. See __reg_combine_64_into_32().

For "jmp ALU64" cast the bounds of the 64bit to their 32-bit
counterpart. For example s32_min_value = (s32)reg->smin_value. For
tnum use only the lower 32bits via, (>>(<<var_off)). See
__reg_combine_64_into_32().

Some questions and TBDs aka the RFC part,

  0) opinions on the approach?

  1) We currently tnum always has 64-bits even for the 32-bit tnum
     tracking. I think ideally we convert the tnum var32_off to a
     32-bit type so the types are correct both in the verifier and
     from what it is tracking. But this in turn means we end up
     with tnum32 ops. It seems to not be strictly needed though so
     I'm saving it for a follow up series. Any thoughts?

     struct tnum {
        u64 value;
        u64 mask;
     }

     struct tnum32 {
        u32 value;
        u32 mask;
     }

  2) I guess this patch could be split into two and still be
     workable. First patch to do alu32 logic and second to
     do jmp32 logic. I slightly prefer the single big patch
     to keep all the logic in one patch but it makes for a
     large change. I'll tear it into two if folks care.

  3) This is passing test_verifier I need to run test_progs
     all the way through still. My test box missed a few tests
     due to kernel feature flags.

  4) I'm testing Cilium now as well to be sure we are still
     working there.

  5) Do we like this approach? Should we push it all the way
     through to stable? We need something for stable and I
     haven't found a better solution yet. Its a good chunk
     of code though if we do that we probably want the fuzzers
     to run over it first.

  6) I need to do another review pass.

  7) I'm writing a set of verifier tests to exercise some of
     the more subtle 32 vs 64-bit cases now.

  8) I have a small test patch I use with test_verifier to
     dump the verifier state every line which I find helpful
     I'll push it to bpf-next in case anyone else cares to
     use it.

As reading the patch, a few minor comments below.

Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
---
  tools/testing/selftests/bpf/test_verifier.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
[...]
  
+/* BPF architecture zero extends alu32 ops into 64-bit registesr */
+static void zext_32_to_64(struct bpf_reg_state *reg)
+{
+	reg->var_off = reg->var32_off = tnum_cast(reg->var32_off, 4);
+	reg->umin_value = reg->smin_value = reg->u32_min_value;

reg->smin_value = reg->u32_min_value? Could you explain?

+	reg->umax_value = reg->smax_value = reg->u32_max_value;
+}
  
  /* truncate register to smaller size (in bytes)
   * must be called with size < BPF_REG_SIZE
@@ -2791,6 +2957,7 @@ static int check_tp_buffer_access(struct bpf_verifier_env *env,
  static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
  {
  	u64 mask;
+	u32 u32mask;
  
  	/* clear high bits in bit representation */
  	reg->var_off = tnum_cast(reg->var_off, size);
@@ -2804,8 +2971,36 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
  		reg->umin_value = 0;
  		reg->umax_value = mask;
  	}
+
+	/* TBD this is its own patch */
+	if (reg->smin_value < 0 || reg->smax_value > reg->umax_value)

When reg->smax_value > reg->umax_value could happen?

+		reg->smax_value = reg->umax_value;
+	else
+		reg->umax_value = reg->smax_value;

Not quite understand the above logic.


  	reg->smin_value = reg->umin_value;
-	reg->smax_value = reg->umax_value;
+
+	/* If size is smaller than 32bit register the 32bit register
+	 * values are also truncated.
+	 */
+	if (size >= 4) {
+		reg->var32_off = tnum_cast(reg->var_off, 4);
+		return;
+	}
+
+	reg->var32_off = tnum_cast(reg->var_off, size);
+	u32mask = ((u32)1 << (size *8)) - 1;

Looks like here u32mask trying to remove the 32bit sign and try to 
compare values. Not quite follow the logic below.

+	if ((reg->u32_min_value & ~u32mask) == (reg->u32_max_value & ~u32mask)) {
+		reg->u32_min_value &= mask;
+		reg->u32_max_value &= mask;
+	} else {
+		reg->u32_min_value = 0;
+		reg->u32_max_value = mask;
+	}
+	if (reg->s32_min_value < 0 || reg->s32_max_value > reg->u32_max_value)
+		reg->s32_max_value = reg->u32_max_value;
+	else
+		reg->u32_max_value = reg->s32_max_value;
+	reg->s32_min_value = reg->u32_min_value;
  }
  
  static bool bpf_map_is_rdonly(const struct bpf_map *map)
@@ -4427,7 +4622,17 @@ static bool signed_add_overflows(s64 a, s64 b)
  	return res < a;
  }
  
[...]