On 10/9/19 10:38 AM, Andrii Nakryiko wrote: > On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@xxxxxxxxxx> wrote: >> >> Pointer to BTF object is a pointer to kernel object or NULL. >> Such pointers can only be used by BPF_LDX instructions. >> The verifier changed their opcode from LDX|MEM|size >> to LDX|PROBE_MEM|size to make JITing easier. >> The number of entries in extable is the number of BPF_LDX insns >> that access kernel memory via "pointer to BTF type". >> Only these load instructions can fault. >> Since x86 extable is relative it has to be allocated in the same >> memory region as JITed code. >> Allocate it prior to last pass of JITing and let the last pass populate it. >> Pointer to extable in bpf_prog_aux is necessary to make page fault >> handling fast. >> Page fault handling is done in two steps: >> 1. bpf_prog_kallsyms_find() finds BPF program that page faulted. >> It's done by walking rb tree. >> 2. then extable for given bpf program is binary searched. >> This process is similar to how page faulting is done for kernel modules. >> The exception handler skips over faulting x86 instruction and >> initializes destination register with zero. This mimics exact >> behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed). >> >> JITs for other architectures can add support in similar way. >> Until then they will reject unknown opcode and fallback to interpreter. >> >> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> >> --- >> arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++++++-- >> include/linux/bpf.h | 3 ++ >> include/linux/extable.h | 10 ++++ >> kernel/bpf/core.c | 20 +++++++- >> kernel/bpf/verifier.c | 1 + >> kernel/extable.c | 2 + >> 6 files changed, 127 insertions(+), 5 deletions(-) >> > > This is surprisingly easy to follow :) Looks good overall, just one > concern about 32-bit distance between ex_handler_bpf and BPF jitted > program below. And I agree with Eric, probably need to ensure proper > alignment for exception_table_entry array. already fixed. > [...] > >> @@ -805,6 +835,48 @@ stx: if (is_imm8(insn->off)) >> else >> EMIT1_off32(add_2reg(0x80, src_reg, dst_reg), >> insn->off); >> + if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { >> + struct exception_table_entry *ex; >> + u8 *_insn = image + proglen; >> + s64 delta; >> + >> + if (!bpf_prog->aux->extable) >> + break; >> + >> + if (excnt >= bpf_prog->aux->num_exentries) { >> + pr_err("ex gen bug\n"); > > This should never happen, right? BUG()? absolutely not. No BUGs in kernel for things like this. If kernel can continue it should. >> + return -EFAULT; >> + } >> + ex = &bpf_prog->aux->extable[excnt++]; >> + >> + delta = _insn - (u8 *)&ex->insn; >> + if (!is_simm32(delta)) { >> + pr_err("extable->insn doesn't fit into 32-bit\n"); >> + return -EFAULT; >> + } >> + ex->insn = delta; >> + >> + delta = (u8 *)ex_handler_bpf - (u8 *)&ex->handler; > > how likely it is that global ex_handle_bpf will be close enough to > dynamically allocated piece of exception_table_entry? 99.9% Since we rely on that in other places in the JIT. See BPF_CALL, for example. But I'd like to keep the check below. Just in case. Same as in BPF_CALL.