On 1/30/20 4:44 PM, John Fastabend wrote: > Yonghong Song wrote: >> >> >> On 1/30/20 3:34 PM, John Fastabend wrote: >>> Alexei Starovoitov wrote: >>>> On Thu, Jan 30, 2020 at 09:38:07AM -0800, John Fastabend wrote: >>>>> Alexei Starovoitov wrote: >>>>>> On Wed, Jan 29, 2020 at 02:52:10PM -0800, John Fastabend wrote: >>>>>>> Daniel Borkmann wrote: >>>>>>>> On 1/29/20 8:28 PM, Alexei Starovoitov wrote: >>>>>>>>> On Wed, Jan 29, 2020 at 8:25 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>> Fixes: 849fa50662fbc ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") >>>>>>>>>>> Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx> >>>>>>>>>> >>>>>>>>>> Applied, thanks! >>>>>>>>> >>>>>>>>> Daniel, >>>>>>>>> did you run the selftests before applying? >>>>>>>>> This patch breaks two. >>>>>>>>> We have to find a different fix. >>>>>>>>> >>>>>>>>> ./test_progs -t get_stack >>>>>>>>> 68: (85) call bpf_get_stack#67 >>>>>>>>> R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) >>>>>>>>> R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; >>>>>>>>> 0xffffffff)) R3_w=inv(id=0,umax_value=4294967295,var_off=(0x0; >>>>>>>>> 0xffffffff)) R4_w=inv0 R6=ctx(id=0,off=0,im? >>>>>>>>> R2 unbounded memory access, make sure to bounds check any array access >>>>>>>>> into a map >>>>>>>> >>>>>>>> Sigh, had it in my wip pre-rebase tree when running tests. I've revert it from the >>>>>>>> tree since this needs to be addressed. Sorry for the trouble. >>>>>>> >>>>>>> Thanks I'm looking into it now. Not sure how I missed it on >>>>>>> selftests either older branch or I missed the test somehow. I've >>>>>>> updated toolchain and kernel now so shouldn't happen again. >>>>>> >>>>>> Looks like smax_value was nuked by <<32 >>32 shifts. >>>>>> 53: (bf) r8 = r0 // R0=inv(id=0,smax_value=800) >>>>>> 54: (67) r8 <<= 32 // R8->smax_value = S64_MAX; in adjust_scalar_min_max_vals() >>>>>> 55: (c7) r8 s>>= 32 >>>>>> ; if (usize < 0) >>>>>> 56: (c5) if r8 s< 0x0 goto pc+28 >>>>>> // and here "less than zero check" doesn't help anymore. >>>>>> >>>>>> Not sure how to fix it yet, but the code pattern used in >>>>>> progs/test_get_stack_rawtp.c >>>>>> is real. Plenty of bpf progs rely on this. >>>>> >>>>> OK I see what happened I have some patches on my llvm tree and forgot to >>>>> pop them off before running selftests :/ These <<=32 s>>=32 pattern pops up >>>>> in a few places for us and causes verifier trouble whenever it is hit. >>>>> >>>>> I think I have a fix for this in llvm, if that is OK. And we can make >>>>> the BPF_RSH and BPF_LSH verifier bounds tighter if we also define the >>>>> architecture expectation on the jit side. For example x86 jit code here, >>>>> >>>>> 146: shl $0x20,%rdi >>>>> 14a: shr $0x20,%rdi >>>>> >>>>> the shr will clear the most significant bit so we can say something about >>>>> the min sign value. I'll generate a couple patches today and send them >>>>> out to discuss. Probably easier to explain with code and examples. >>>> >>>> How about we detect this pattern on the verifier side and replace with >>>> pseudo insn that will do 32-bit sign extend. Most archs have special >>>> cpu instruction to do this much more efficiently than two shifts. >>>> If JIT doesn't implement that pseudo yet the verifier can convert >>>> it back to two shifts. >>>> Then during verification it will process pseudo_sign_extend op easily. >>>> So the idea: >>>> 1. pattern match sequence of two shifts in a pass similar to >>>> replace_map_fd_with_map_ptr() before main do_check() >>>> 2. pseudo_sign_extend gets process in do_check() doing the right thing >>>> with bpf_reg_state. >>>> 3. JIT this pseudo insn or convert back >>>> >>>> Long term we can upgrade this pseudo insn into uapi and let llvm emit it. >>> >>> I'm not sure pattern matching in the verifier is best. This paticular >>> case of lsh/rsh games is the result of BPF backend generating it from >>> a LLVM IR zext. >>> >>> Here is the LLVM IR generated from test_get_stack_rawtp that produces >>> the zext. >>> >>> >>> %26 = call i32 inttoptr (i64 67 to i32 (i8*, i8*, i32, i64)*)(i8* %0, i8* nonnull %23, i32 800, i64 256) #3, !dbg !166 >>> call void @llvm.dbg.value(metadata i32 %26, metadata !124, metadata !DIExpression()), !dbg !130 >>> %27 = icmp slt i32 %26, 0, !dbg !167 >>> br i1 %27, label %41, label %28, !dbg !169 >>> >>> 28: ; preds = %25 >>> %29 = zext i32 %26 to i64, !dbg !170 >>> %30 = getelementptr i8, i8* %23, i64 %29, !dbg !170 >>> >>> >>> Clang wants to do zext because we are promoting a i32 to i64. In the >>> BPF backend code we pattern match this as follows, >>> >>> def : Pat<(i64 (zext GPR32:$src)), >>> (SRL_ri (SLL_ri (MOV_32_64 GPR32:$src), 32), 32)>; >>> >>> Which generates the object code (again from test_get_stack_rawtp), >>> >>> 56: bc 81 00 00 00 00 00 00 w1 = w8 >>> 57: 67 01 00 00 20 00 00 00 r1 <<= 32 >>> 58: 77 01 00 00 20 00 00 00 r1 >>= 32 >>> >>> Unfortunately, this is a pretty poor form of zext from the verifiers point >>> of view it completely nukes bounds as you observed. So how about doing >>> something a bit simpler from the backend side. Noting that moving 32bit >>> into 32bit zero-extends on x86 and we also make that assumption elsewhere >>> so it should be safe to implement the zext from above object dump as just >>> the mov >>> >>> w1 = w8 >>> >>> Which we can implement in the backend with this patch, >>> >>> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td >>> index 0f39294..a187103 100644 >>> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td >>> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td >>> @@ -733,7 +733,7 @@ def : Pat<(i64 (sext GPR32:$src)), >>> (SRA_ri (SLL_ri (MOV_32_64 GPR32:$src), 32), 32)>; >>> >>> def : Pat<(i64 (zext GPR32:$src)), >>> - (SRL_ri (SLL_ri (MOV_32_64 GPR32:$src), 32), 32)>; >>> + (MOV_32_64 GPR32:$src)>; >> >> Thanks. This should work. >> >> Also, in BPF backend, we have a phase in BPFMIPeephole.cpp to >> remove such <<= and >>= in general. It may miss some cases. >> >> But verifier seems handling <<= and >>= correctly, right? >> Even we have it, the verifier should reach the same conclusion >> compared to not having it, right? >> > > No, verifier marks it unknown and doesn't track it fully. We > can perhaps improve the verifier but the above is a nice > fix/improvement for the backend imo regardless. > > case BPF_LSH: > if (umax_val >= insn_bitness) { > /* Shifts greater than 31 or 63 are undefined. > * This includes shifts by a negative number. > */ > mark_reg_unknown(env, regs, insn->dst_reg); > break; > } > /* We lose all sign bit information (except what we can pick > * up from var_off) > */ But in this case, insn_bitness is 64 and supposed umax_val = umin_val should be 32 since the original instruction is r1 <<= 32. Probably somewhere needs improvement. Do agree that LLVM change is good and please submit a patch for it. Thanks! > >>> >>> Now the new object code is simply, >>> >>> 54: c6 08 14 00 00 00 00 00 if w8 s< 0 goto +20 <LBB0_6> >>> 55: 1c 89 00 00 00 00 00 00 w9 -= w8 >>> 56: bc 81 00 00 00 00 00 00 w1 = w8 >>> 57: bf 72 00 00 00 00 00 00 r2 = r7 >>> 58: 0f 12 00 00 00 00 00 00 r2 += r1 >>> 59: bf 61 00 00 00 00 00 00 r1 = r6 >>> 60: bc 93 00 00 00 00 00 00 w3 = w9 >>> 61: b7 04 00 00 00 00 00 00 r4 = 0 >>> 62: 85 00 00 00 43 00 00 00 call 67 >>> ; if (ksize < 0) >>> >>> That is the block from your originally trace. But one issue still >>> remains and just the above llvm backend update doesn't fix the verifier >>> problem created by my patch because in the false branch after line 54 >>> above we don't have the right bounds. >>> >>> 53: (bc) w8 = w0 >>> ; if (usize < 0) >>> 54: (c6) if w8 s< 0x0 goto pc+20 >>> R0=inv(id=0,smax_value=800) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv800 R10=fp0 fp-8=mmmm???? >>> ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); >>> 55: (1c) w9 -= w8 >>> ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); >>> 56: (bc) w1 = w8 >>> 57: (bf) r2 = r7 >>> 58: (0f) r2 += r1 >>> R0_rw=invP(id=0,smax_value=800) R6=ctx(id=0,off=0,imm=0) R7_rw=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R9_rw=inv800 R10=fp0 fp-8=mmmm???? >>> parent didn't have regs=1 stack=0 marks >>> last_idx 52 first_idx 40 >>> regs=1 stack=0 before 52: (85) call bpf_get_stack#67 >>> ; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); >>> 59: (bf) r1 = r6 >>> 60: (bc) w3 = w9 >>> 61: (b7) r4 = 0 >>> 62: (85) call bpf_get_stack#67 >>> >>> At line :54 R0 has bounds [SMIN, 800] which by 53: are the bounds in >>> w8 remembering a load will zero extend there. So we should expect >>> the false branch to have bounds [0, 800] exactly as we want. But, >>> we instead got only a umax_value? Digging deeper we are failing here, >>> >>> /* Return true if VAL is compared with a s64 sign extended from s32, and they >>> * are with the same signedness. >>> */ >>> static bool cmp_val_with_extended_s64(s64 sval, struct bpf_reg_state *reg) >>> { >>> return ((s32)sval >= 0 && >>> reg->smin_value >= 0 && reg->smax_value <= S32_MAX) || >>> ((s32)sval < 0 && >>> reg->smax_value <= 0 && reg->smin_value >= S32_MIN); >>> } >>> >>> This appears to conservative. I'll need to analyze that a bit but it >>> should be safe to relax to catch above <0 case. After that I expect >>> we should be passing again. >>> >>> Sorry for the long thread but those are the details. What do you think, >>> in the meantime I'll generate the relaxed bounds on cmp_val_with_extended >>> and see what we can cook up with Daniel. It avoid pseudo instructions >>> and pattern matching which I think is a bit more general. >>> >>> Thanks, >>> John >>> > >