On Thu, 2023-12-14 at 16:06 -0800, Andrii Nakryiko wrote: > On Thu, Dec 14, 2023 at 8:26 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > On Thu, 2023-12-14 at 17:10 +0200, Eduard Zingerman wrote: > > > [...] > > > > The reason why retval checks fails is that the way you disable dead > > > > code removal pass is not complete. Disable opt_remove_dead_code() > > > > just prevent the instruction #30 from being removed, but also note > > > > opt_hard_wire_dead_code_branches(), which convert conditional jump > > > > into unconditional one, so #30 is still skipped. > > > > > > > > > Note that I tried this test with two functions: > > > > > - bpf_get_current_cgroup_id, with this function I get retval 2, not 4 :) > > > > > - bpf_get_prandom_u32, with this function I get a random retval each time. > > > > > > > > > > What is the expectation when 'bpf_get_current_cgroup_id' is used? > > > > > That it is some known (to us) number, but verifier treats it as unknown scalar? > > > > > > > > > > > > > Either one would work, but to make #30 always taken, r0 should be > > > > non-zero. > > > > > > Oh, thank you, I made opt_hard_wire_dead_code_branches() a noop, > > > replaced r0 = 0x4 by r0 /= 0 and see "divide error: 0000 [#1] PREEMPT SMP NOPTI" > > > error in the kernel log on every second or third run of the test > > > (when using prandom). > > > > > > Working to minimize the test case will share results a bit later. > > > > Here is the minimized version of the test: > > https://gist.github.com/eddyz87/fb4d3c7d5aabdc2ae247ed73fefccd32 > > > > If executed several times: ./test_progs -vvv -a verifier_and/pruning_test > > it eventually crashes VM with the following error: > > > > [ 2.039066] divide error: 0000 [#1] PREEMPT SMP NOPTI > > ... > > [ 2.039987] Call Trace: > > [ 2.039987] <TASK> > > [ 2.039987] ? die+0x36/0x90 > > [ 2.039987] ? do_trap+0xdb/0x100 > > [ 2.039987] ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60 > > [ 2.039987] ? do_error_trap+0x7d/0x110 > > [ 2.039987] ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60 > > [ 2.039987] ? exc_divide_error+0x38/0x50 > > [ 2.039987] ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60 > > [ 2.039987] ? asm_exc_divide_error+0x1a/0x20 > > [ 2.039987] ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60 > > [ 2.039987] bpf_test_run+0x1b5/0x350 > > [ 2.039987] ? bpf_test_run+0x115/0x350 > > ... > > > > I'll continue debugging this a bit later today. > > > > Great, thanks a lot, Eduard. Let's paste the program here for discussion: > > ... > I managed to minimize it a bit more, getting rid of r5, (not that it changes anything): SEC("socket") __success __flag(BPF_F_TEST_STATE_FREQ) __retval(42) __naked void pruning_test(void) { asm volatile ( " call %[bpf_get_prandom_u32];\n" " r7 = r0;\n" " r8 = r0;\n" " call %[bpf_get_prandom_u32];\n" " if r0 > 1 goto +0;\n" " if r8 >= r0 goto 1f;\n" " r8 += r8;\n" " if r7 == 0 goto 1f;\n" " r0 /= 0;\n" "1: r0 = 42;\n" " exit;\n" : : __imm(bpf_get_prandom_u32) : __clobber_all); } > If you agree with the analysis, we can start discussing what's the > best way to fix this. Please give me some more time, I'm adding some prints do understand why current logic does not mark r8 for state that has "if r8 >= r0 goto 1f;\n" as it's first instruction, on a surface it should.