On Wed, Jun 9, 2021 at 5:05 AM Ravi Bangoria <ravi.bangoria@xxxxxxxxxxxxx> wrote: > > Hi Alexei, > > On 7/28/20 8:51 PM, Jean-Philippe Brucker wrote: > > The following patch adds support for BPF_PROBE_MEM on arm64. The > > implementation is simple but I wanted to give a bit of background first. > > If you're familiar with recent BPF development you can skip to the patch > > (or fact-check the following blurb). > > > > BPF programs used for tracing can inspect any of the traced function's > > arguments and follow pointers in struct members. Traditionally the BPF > > program would get a struct pt_regs as argument and cast the register > > values to the appropriate struct pointer. The BPF verifier would mandate > > that any memory access uses the bpf_probe_read() helper, to suppress > > page faults (see samples/bpf/tracex1_kern.c). > > > > With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF), > > the verifier can now check the type of any access performed by a BPF > > program. It rejects for example programs that cast to a different > > structure and perform out-of-bounds accesses, or programs that attempt > > to dereference something that isn't a pointer, or that hasn't gone > > through a NULL check. > > > > As this makes tracing programs safer, the verifier now allows loading > > programs that access struct members without bpf_probe_read(). It is > > however still possible to trigger page faults. For example in the > > following example with which I've tested this patch, the verifier does > > not mandate a NULL check for the second-level pointer: > > > > /* > > * From tools/testing/selftests/bpf/progs/bpf_iter_task.c > > * dump_task() is called for each task. > > */ > > SEC("iter/task") > > int dump_task(struct bpf_iter__task *ctx) > > { > > struct seq_file *seq = ctx->meta->seq; > > struct task_struct *task = ctx->task; > > > > /* Program would be rejected without this check */ > > if (task == NULL) > > return 0; > > > > /* > > * However the verifier does not currently mandate > > * checking task->mm, and the following faults for kernel > > * threads. > > */ > > BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm); > > return 0; > > } > > > > Even if it checked this case, the verifier couldn't guarantee that all > > accesses are safe since kernel structures could in theory contain > > garbage or error pointers. So to allow fast access without > > bpf_probe_read(), a JIT implementation must support BPF exception > > tables. For each access to a BTF pointer, the JIT generates an entry > > into an exception table appended to the BPF program. If the access > > faults at runtime, the handler skips the faulting instruction. The > > example above will display vm=0 for kernel threads. > > I'm trying with the example above (task->mm->total_vm) on x86 machine > with bpf/master (11fc79fc9f2e3) plus commit 4c5de127598e1 ("bpf: Emit > explicit NULL pointer checks for PROBE_LDX instructions.") *reverted*, > I'm seeing the app getting killed with error in dmesg. > > $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task > $ sudo cat /sys/fs/bpf/task > Killed > > $ dmesg > [ 188.810020] BUG: kernel NULL pointer dereference, address: 00000000000000c8 > [ 188.810030] #PF: supervisor read access in kernel mode > [ 188.810034] #PF: error_code(0x0000) - not-present page > > IIUC, this should be handled by bpf exception table rather than killing > the app. Am I missing anything? For PROBE_LDX the verifier guarantees that the address is either a very likely valid kernel address or NULL. On x86 the user and kernel address spaces are shared and NULL is a user address, so there cannot be an exception table for NULL. Hence x86-64 JIT inserts NULL check when it converts PROBE_LDX into load insn.