On 12/3/23 3:48 PM, Jiri Olsa wrote:
Lee pointed out issue found by syscaller [0] hitting BUG in prog array map poke update in prog_array_map_poke_run function due to error value returned from bpf_arch_text_poke function. There's race window where bpf_arch_text_poke can fail due to missing bpf program kallsym symbols, which is accounted for with check for -EINVAL in that BUG_ON call. The problem is that in such case we won't update the tail call jump and cause imbalance for the next tail call update check which will fail with -EBUSY in bpf_arch_text_poke. I'm hitting following race during the program load: CPU 0 CPU 1 bpf_prog_load bpf_check do_misc_fixups prog_array_map_poke_track map_update_elem bpf_fd_array_map_update_elem prog_array_map_poke_run bpf_arch_text_poke returns -EINVAL bpf_prog_kallsyms_add After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next poke update fails on expected jump instruction check in bpf_arch_text_poke with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run. Similar race exists on the program unload. Fixing this by moving the update to bpf_arch_poke_desc_update function which makes sure we call __bpf_arch_text_poke that skips the bpf address check. Each architecture has slightly different approach wrt looking up bpf address in bpf_arch_text_poke, so instead of splitting the function or adding new 'checkip' argument in previous version, it seems best to move the whole map_poke_run update as arch specific code. [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810 Cc: Lee Jones <lee@xxxxxxxxxx> Cc: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Reported-by: syzbot+97a4fe20470e9bc30810@xxxxxxxxxxxxxxxxxxxxxxxxx Tested-by: Lee Jones <lee@xxxxxxxxxx> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
In bpf_arch_text_poke(), we also have if (is_endbr(*(u32 *)ip)) ip += ENDBR_INSN_SIZE; Since the indirect call be converted to direct call, so I think we do not need endbr insn at the beginning of the function even if jit might have added endbr or some other stuff in the beginning of the function. See https://lore.kernel.org/bpf/20231130133630.192490507@xxxxxxxxxxxxx/ The patch looks good to me. Acked-by: Yonghong Song <yonghong.song@xxxxxxxxx>