Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 28 Jan 2021 17:08:24 -0800

On Fri, Jan 29, 2021 at 02:03:02AM +0100, Jann Horn wrote:
> On Fri, Jan 29, 2021 at 1:43 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> > On Fri, Jan 29, 2021 at 01:35:16AM +0100, Jann Horn wrote:
> > > On Fri, Jan 29, 2021 at 1:11 AM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > > > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > > > that means you have to inline a valid implementation.  In particular,
> > > > > you need to check that you're accessing *kernel* memory.  Just like
> > > >
> > > > That check is on the verifier side. It only does it for kernel
> > > > pointers with known types.
> > > > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > > > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > > > points to valid kernel field which is also a pointer.
> > > > What it doesn't check that b != null, so
> > > > that users don't have to write silly code with 'if (p)' after every
> > > > dereference.
> > >
> > > How is that supposed to work? If I e.g. have a pointer to a
> > > task_struct, and I do something like:
> > >
> > > task->mm->mmap->vm_file->f_inode
> > >
> > > and another thread concurrently mutates the VMA tree and frees the VMA
> > > that we're traversing here, how can BPF guarantee that
> > > task->mm->mmap->vm_file is a valid pointer and not whatever garbage we
> > > read from freed memory?
> >
> > Please read upthread. Every -> is replaced with probe_kernel_read.
> > That's what was kprobes were doing for years. That's what bpf was
> > doing for years.
> 
> Uh... but -> on PTR_TO_BTF_ID pointers is not replaced with
> probe_kernel_read() and can be done directly with BPF_LDX, right?

Almost. They're replaced with BPF_LDX | BPF_PROBE_MEM.
The interpreter is calling bpf_probe_read_kernel() to process them.
JIT is replacing these insns with atomic loads and extable.

> And
> dereferencing a PTR_TO_BTF_ID pointer returns another PTR_TO_BTF_ID
> pointer if type information is available, right? (See
> btf_struct_access().) And stuff like BPF LSM programs or some of the
> XDP stuff receives BTF-typed pointers to kernel data structures as
> arguments, right?
> 
> And as an example, this is visible in
> tools/testing/selftests/bpf/progs/ima.c , which does:
> 
> SEC("lsm.s/bprm_committed_creds")
> int BPF_PROG(ima, struct linux_binprm *bprm)
> {
>   u32 pid = bpf_get_current_pid_tgid() >> 32;
> 
>   if (pid == monitored_pid)
>     ima_hash_ret = bpf_ima_inode_hash(bprm->file->f_inode,
>            &ima_hash, sizeof(ima_hash));
> 
>   return 0;
> }
> 
> As far as I can tell, we are getting a BTF-typed pointer "bprm", doing
> dereferences that again yield BTF-typed pointers, and then pass the
> BTF-typed pointer at the end of it into bpf_ima_inode_hash()?

correct. all of them bpf_probe_read_kernel() == copy_from_kernel_nofault().