On Thu, Jan 30, 2020 at 9:48 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > at the moment. I'll take a look in the morning. That fragment 55,56, > 57 are coming from a zext in llvm though. I don't think so. Here is how IR looks after all optimizations and right before instruction selection: %call12 = call i32 inttoptr (i64 67 to i32 (i8*, i8*, i32, i64)*)(i8* %ctx, i8* nonnull %call8, i32 800, i64 256) #2 %cmp = icmp slt i32 %call12, 0 br i1 %cmp, label %cleanup, label %if.end15 if.end15: ; preds = %if.end11 %idx.ext70 = zext i32 %call12 to i64 %add.ptr = getelementptr i8, i8* %call8, i64 %idx.ext70 %sub = sub nsw i32 800, %call12 %call16 = call i32 inttoptr (i64 67 to i32 (i8*, i8*, i32, i64)*)(i8* %ctx, i8* %add.ptr, i32 %sub, i64 0) #2 %cmp17 = icmp slt i32 %call16, 0 br i1 %cmp17, label %cleanup, label %if.end20 and corresponding C code: usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK); if (usize < 0) return 0; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0); if (ksize < 0) %idx.ext70 = zext i32 %call12 to i64 that you see is a part of 'raw_data + usize' math. The result of first bpf_get_stack() is directly passed into "icmp slt i32 %call12, 0" and during instruction selection the backend does sign extension with <<32 s>>32. I agree that peephole zext->mov32_64 is correct and a nice optimization, but I still don't see how it helps this case.