On Sat, Aug 26, 2023 at 08:12:30PM +0200, Nam Cao wrote: > On Sat, Aug 26, 2023 at 03:44:48PM +0200, Björn Töpel wrote: > > Björn Töpel <bjorn@xxxxxxxxxx> writes: > > > > > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf > > > selftests on bpf-next 9e3b47abeb8f. > > > > > > I'm able to reproduce the hang by multiple runs of: > > > | ./test_progs -a link_api -a linked_list > > > I'm currently investigating that. > > > > +Guo for uprobe > > > > This was an interesting bug. The hang is an ebreak (RISC-V breakpoint), > > that puts the kernel into an infinite loop. > > > > To reproduce, simply run the BPF selftest: > > ./test_progs -v -a link_api -a linked_list > > > > First the link_api test is being run, which exercises the uprobe > > functionality. The link_api test completes, and test_progs will still > > have the uprobe active/enabled. Next the linked_list test triggered a > > WARN_ON (which is implemented via ebreak as well). > > > > Now, handle_break() is entered, and the uprobe_breakpoint_handler() > > returns true exiting the handle_break(), which returns to the WARN > > ebreak, and we have merry-go-round. > > > > Lucky for the RISC-V folks, the BPF memory handler had a WARN that > > surfaced the bug! ;-) > > Thanks for the analysis. > > I couldn't reproduce the problem, so I am just taking a guess here. The problem > is bebcause uprobes didn't find a probe point at that ebreak instruction. However, > it also doesn't think a ebreak instruction is there, then it got confused and just > return back to the ebreak instruction, then everything repeats. > > The reason why uprobes didn't think there is a ebreak instruction is because > is_trap_insn() only returns true if it is a 32-bit ebreak, or 16-bit c.ebreak if > C extension is available, not both. So a 32-bit ebreak is not correctly recognized > as a trap instruction. I feel like I wasn't very clear with this: I was talking about handle_swbp() in kernel/events/uprobes.c. In this function, the call to find_active_uprobe() should return false. Then uprobes check if the trap instruction is still there by calling is_trap_insn(), who correctly says "no". So uprobes assume it is safe to just comeback to that address. If is_trap_insn() correctly returns true, then uprobes would know that this is a ebreak, but not a probe, and handle thing correctly. Best regards, Nam