Alexei Starovoitov wrote: > On Thu, Jan 30, 2020 at 9:48 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > at the moment. I'll take a look in the morning. That fragment 55,56, > > 57 are coming from a zext in llvm though. > OK I see the disconnect now. I don't get the same instructions at all. I only have the <<32 >> 32, no signed shift s>>32. > I don't think so. Here is how IR looks after all optimizations > and right before instruction selection: Same llvm ir though > %call12 = call i32 inttoptr (i64 67 to i32 (i8*, i8*, i32, > i64)*)(i8* %ctx, i8* nonnull %call8, i32 800, i64 256) #2 > %cmp = icmp slt i32 %call12, 0 > br i1 %cmp, label %cleanup, label %if.end15 > > if.end15: ; preds = %if.end11 > %idx.ext70 = zext i32 %call12 to i64 > %add.ptr = getelementptr i8, i8* %call8, i64 %idx.ext70 > %sub = sub nsw i32 800, %call12 > %call16 = call i32 inttoptr (i64 67 to i32 (i8*, i8*, i32, > i64)*)(i8* %ctx, i8* %add.ptr, i32 %sub, i64 0) #2 > %cmp17 = icmp slt i32 %call16, 0 > br i1 %cmp17, label %cleanup, label %if.end20 > %26 = call i32 inttoptr (i64 67 to i32 (i8*, i8*, i32, i64)*)(i8* %0, i8* nonnull %23, i32 800, i64 256) #3, !dbg !166 %27 = icmp slt i32 %26, 0, !dbg !167 br i1 %27, label %41, label %28, !dbg !169 28: ; preds = %25 %29 = zext i32 %26 to i64, !dbg !170 %30 = getelementptr i8, i8* %23, i64 %29, !dbg !170 > and corresponding C code: > usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK); > if (usize < 0) > return 0; > > ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); > if (ksize < 0) same as well. But my object code only has this (unpatched llvm) 56: bc 81 00 00 00 00 00 00 w1 = w8 57: 67 01 00 00 20 00 00 00 r1 <<= 32 58: 77 01 00 00 20 00 00 00 r1 >>= 32 > > %idx.ext70 = zext i32 %call12 to i64 > that you see is a part of 'raw_data + usize' math. > The result of first bpf_get_stack() is directly passed into > "icmp slt i32 %call12, 0" > and during instruction selection the backend does > sign extension with <<32 s>>32. Assuming latest bpf tree and llvm master branch? > > I agree that peephole zext->mov32_64 is correct and a nice optimization, > but I still don't see how it helps this case. Also don't mind to build pseudo instruction here for signed extension but its not clear to me why we are getting different instruction selections? Its not clear to me why sext is being chosen in your case? .John