On Wed, Apr 22, 2020 at 10:41 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, Apr 20, 2020 at 10:11 PM Andrii Nakryiko <andriin@xxxxxx> wrote: > > > > To make BPF verifier verbose log more releavant and easier to use to debug > > verification failures, "pop" parts of log that were successfully verified. > > This has effect of leaving only verifier logs that correspond to code branches > > that lead to verification failure, which in practice should result in much > > shorter and more relevant verifier log dumps. This behavior is made the > > default behavior and can be overriden to do exhaustive logging by specifying > > BPF_LOG_LEVEL2 log level. > > > > Using BPF_LOG_LEVEL2 to disable this behavior is not ideal, because in some > > cases it's good to have BPF_LOG_LEVEL2 per-instruction register dump > > verbosity, but still have only relevant verifier branches logged. But for this > > patch, I didn't want to add any new flags. It might be worth-while to just > > rethink how BPF verifier logging is performed and requested and streamline it > > a bit. But this trimming of successfully verified branches seems to be useful > > and a good default behavior. > > > > To test this, I modified runqslower slightly to introduce read of > > uninitialized stack variable. Log (**truncated in the middle** to save many > > lines out of this commit message) BEFORE this change: > > > > ; int handle__sched_switch(u64 *ctx) > > 0: (bf) r6 = r1 > > ; struct task_struct *prev = (struct task_struct *)ctx[1]; > > 1: (79) r1 = *(u64 *)(r6 +8) > > func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct' > > 2: (b7) r2 = 0 > > ; struct event event = {}; > > 3: (7b) *(u64 *)(r10 -24) = r2 > > last_idx 3 first_idx 0 > > regs=4 stack=0 before 2: (b7) r2 = 0 > > 4: (7b) *(u64 *)(r10 -32) = r2 > > 5: (7b) *(u64 *)(r10 -40) = r2 > > 6: (7b) *(u64 *)(r10 -48) = r2 > > ; if (prev->state == TASK_RUNNING) > > > > [ ... instructions from insn #7 through #50 are cut out ... ] > > > > 51: (b7) r2 = 16 > > 52: (85) call bpf_get_current_comm#16 > > last_idx 52 first_idx 42 > > regs=4 stack=0 before 51: (b7) r2 = 16 > > ; bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, > > 53: (bf) r1 = r6 > > 54: (18) r2 = 0xffff8881f3868800 > > 56: (18) r3 = 0xffffffff > > 58: (bf) r4 = r7 > > 59: (b7) r5 = 32 > > 60: (85) call bpf_perf_event_output#25 > > last_idx 60 first_idx 53 > > regs=20 stack=0 before 59: (b7) r5 = 32 > > 61: (bf) r2 = r10 > > ; event.pid = pid; > > 62: (07) r2 += -16 > > ; bpf_map_delete_elem(&start, &pid); > > 63: (18) r1 = 0xffff8881f3868000 > > 65: (85) call bpf_map_delete_elem#3 > > ; } > > 66: (b7) r0 = 0 > > 67: (95) exit > > > > from 44 to 66: safe > > > > from 34 to 66: safe > > > > from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 > > ; bpf_map_update_elem(&start, &pid, &ts, 0); > > 28: (bf) r2 = r10 > > ; > > 29: (07) r2 += -16 > > ; tsp = bpf_map_lookup_elem(&start, &pid); > > 30: (18) r1 = 0xffff8881f3868000 > > 32: (85) call bpf_map_lookup_elem#1 > > invalid indirect read from stack off -16+0 size 4 > > processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4 > > > > Notice how there is a successful code path from instruction 0 through 67, few > > successfully verified jumps (44->66, 34->66), and only after that 11->28 jump > > plus error on instruction #32. > > > > AFTER this change (full verifier log, **no truncation**): > > > > ; int handle__sched_switch(u64 *ctx) > > 0: (bf) r6 = r1 > > ; struct task_struct *prev = (struct task_struct *)ctx[1]; > > 1: (79) r1 = *(u64 *)(r6 +8) > > func 'sched_switch' arg1 has btf_id 151 type STRUCT 'task_struct' > > 2: (b7) r2 = 0 > > ; struct event event = {}; > > 3: (7b) *(u64 *)(r10 -24) = r2 > > last_idx 3 first_idx 0 > > regs=4 stack=0 before 2: (b7) r2 = 0 > > 4: (7b) *(u64 *)(r10 -32) = r2 > > 5: (7b) *(u64 *)(r10 -40) = r2 > > 6: (7b) *(u64 *)(r10 -48) = r2 > > ; if (prev->state == TASK_RUNNING) > > 7: (79) r2 = *(u64 *)(r1 +16) > > ; if (prev->state == TASK_RUNNING) > > 8: (55) if r2 != 0x0 goto pc+19 > > R1_w=ptr_task_struct(id=0,off=0,imm=0) R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 > > ; trace_enqueue(prev->tgid, prev->pid); > > 9: (61) r1 = *(u32 *)(r1 +1184) > > 10: (63) *(u32 *)(r10 -4) = r1 > > ; if (!pid || (targ_pid && targ_pid != pid)) > > 11: (15) if r1 == 0x0 goto pc+16 > > > > from 11 to 28: R1_w=inv0 R2_w=inv0 R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=mmmm???? fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 > > ; bpf_map_update_elem(&start, &pid, &ts, 0); > > 28: (bf) r2 = r10 > > ; > > 29: (07) r2 += -16 > > ; tsp = bpf_map_lookup_elem(&start, &pid); > > 30: (18) r1 = 0xffff8881db3ce800 > > 32: (85) call bpf_map_lookup_elem#1 > > invalid indirect read from stack off -16+0 size 4 > > processed 65 insns (limit 1000000) max_states_per_insn 1 total_states 5 peak_states 5 mark_read 4 > > > > Notice how in this case, there are 0-11 instructions + jump from 11 to > > 28 is recorded + 28-32 instructions with error on insn #32. > > > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > > This is great idea! > Thanks! > But two test_verifier tests failed: My bad, I forget to run test_verifier and test_maps. Will take a look and fix. > #722/p precise: ST insn causing spi > allocated_stack FAIL > Unexpected verifier log in successful load! > EXP: 5: (2d) if r4 > r0 goto pc+0 > RES: > 0: (bf) r3 = r10 > 1: (55) if r3 != 0x7b goto pc+0 > > from 1 to 2: safe > processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 4 > peak_states 4 mark_read 1 > > Please fix them up.