On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@xxxxxxxxxx> wrote: > > Pointer to BTF object is a pointer to kernel object or NULL. > Such pointers can only be used by BPF_LDX instructions. > The verifier changed their opcode from LDX|MEM|size > to LDX|PROBE_MEM|size to make JITing easier. > The number of entries in extable is the number of BPF_LDX insns > that access kernel memory via "pointer to BTF type". > Only these load instructions can fault. > Since x86 extable is relative it has to be allocated in the same > memory region as JITed code. > Allocate it prior to last pass of JITing and let the last pass populate it. > Pointer to extable in bpf_prog_aux is necessary to make page fault > handling fast. > Page fault handling is done in two steps: > 1. bpf_prog_kallsyms_find() finds BPF program that page faulted. > It's done by walking rb tree. > 2. then extable for given bpf program is binary searched. > This process is similar to how page faulting is done for kernel modules. > The exception handler skips over faulting x86 instruction and > initializes destination register with zero. This mimics exact > behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed). > > JITs for other architectures can add support in similar way. > Until then they will reject unknown opcode and fallback to interpreter. > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > --- > arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++++++-- > include/linux/bpf.h | 3 ++ > include/linux/extable.h | 10 ++++ > kernel/bpf/core.c | 20 +++++++- > kernel/bpf/verifier.c | 1 + > kernel/extable.c | 2 + > 6 files changed, 127 insertions(+), 5 deletions(-) > This is surprisingly easy to follow :) Looks good overall, just one concern about 32-bit distance between ex_handler_bpf and BPF jitted program below. And I agree with Eric, probably need to ensure proper alignment for exception_table_entry array. [...] > @@ -805,6 +835,48 @@ stx: if (is_imm8(insn->off)) > else > EMIT1_off32(add_2reg(0x80, src_reg, dst_reg), > insn->off); > + if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { > + struct exception_table_entry *ex; > + u8 *_insn = image + proglen; > + s64 delta; > + > + if (!bpf_prog->aux->extable) > + break; > + > + if (excnt >= bpf_prog->aux->num_exentries) { > + pr_err("ex gen bug\n"); This should never happen, right? BUG()? > + return -EFAULT; > + } > + ex = &bpf_prog->aux->extable[excnt++]; > + > + delta = _insn - (u8 *)&ex->insn; > + if (!is_simm32(delta)) { > + pr_err("extable->insn doesn't fit into 32-bit\n"); > + return -EFAULT; > + } > + ex->insn = delta; > + > + delta = (u8 *)ex_handler_bpf - (u8 *)&ex->handler; how likely it is that global ex_handle_bpf will be close enough to dynamically allocated piece of exception_table_entry? > + if (!is_simm32(delta)) { > + pr_err("extable->handler doesn't fit into 32-bit\n"); > + return -EFAULT; > + } > + ex->handler = delta; > + > + if (dst_reg > BPF_REG_9) { > + pr_err("verifier error\n"); > + return -EFAULT; > + } > + /* > + * Compute size of x86 insn and its target dest x86 register. > + * ex_handler_bpf() will use lower 8 bits to adjust > + * pt_regs->ip to jump over this x86 instruction > + * and upper bits to figure out which pt_regs to zero out. > + * End result: x86 insn "mov rbx, qword ptr [rax+0x14]" > + * of 4 bytes will be ignored and rbx will be zero inited. > + */ > + ex->fixup = (prog - temp) | (reg2pt_regs[dst_reg] << 8); > + } > break; > > /* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */ > @@ -1058,6 +1130,11 @@ xadd: if (is_imm8(insn->off)) > addrs[i] = proglen; > prog = temp; > } > + > + if (image && excnt != bpf_prog->aux->num_exentries) { > + pr_err("extable is not populated\n"); Isn't this a plain BUG() ? > + return -EFAULT; > + } > return proglen; > } > [...]