Re: [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps

Song Liu <song@xxxxxxxxxx> · Tue, 22 Feb 2022 23:29:53 -0800

On Tue, Feb 22, 2022 at 12:21 AM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
[...]

> >
> > I guess I missed some context here. Could you please provide some reference
> > to the use cases of these features?
> >
>
> The common usecase is caching references to objects inside BPF maps, to avoid
> costly lookups, and being able to raise it once for the duration of program
> invocation when passing it to multiple helpers (to avoid further re-lookups).
> Storing references also allows you to control object lifetime.
>
> One other use case is enabling xdp_frame queueing in XDP using this, but that
> still needs some integration work after this lands, so it's a bit early to
> comment on the specifics.
>
> Other than that, I think Alexei already mentioned this could be easily extended
> to do memory allocation returning a PTR_TO_BTF_ID in a BPF program [0] in the
> future.
>
>   [0]: https://lore.kernel.org/bpf/20220216230615.po6huyrgkswk7u67@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> > For Unreferenced kernel pointer and userspace pointer, it seems that there is
> > no guarantee the pointer will still be valid during access (we only know it is
> > valid when it is stored in the map). Is this correct?
> >
>
> That is correct. In the case of unreferenced and referenced kernel pointers,
> when you do a BPF_LDX, both are marked as PTR_UNTRUSTED, and it is not allowed
> to pass them into helpers or kfuncs, because from that point onwards we cannot
> claim that the object is still alive when pointer is used later. Still,
> dereference is permitted because verifier handles faults for bad accesses using
> PROBE_MEM conversion for PTR_TO_BTF_ID loads in convert_ctx_accesses (which is
> then later detected by JIT to build exception table used by exception handler).
>
> In case of reading unreferenced pointer, in some cases you know that the pointer
> will stay valid, so you can just store it in the map and load and directly
> access it, it imposes very little restrictions.
>
> For the referenced case, and BPF_LDX marking it as PTR_UNTRUSTED, you could say
> that this makes it a lot less useful, because if BPF program already holds
> reference, just to make sure I _read valid data_, I still have to use the
> kptr_get style helper to raise and put reference to ensure the object is alive
> when it is accessed.
>
> So in that case, for RCU protected objects, it should still wait for BPF program
> to hit BPF_EXIT before the actual release, but for other cases like the case of
> sleepable programs, or objects where refcount alone manages lifetime, you can
> also detect writer presence of the other BPF program (to detect if pointer
> during our access was xchg'd out) using a seqlock style scheme:
>
>         v = bpf_map_lookup_elem(&map, ...);
>         if (!v)
>                 return 0;
>         seq_begin = v->seq;
>         atomic_thread_fence(memory_order_acquire); // A
>         <do access>
>         atomic_thread_fence(memory_order_acquire); // B
>         seq_end = v->seq;
>         if (seq_begin & 1 || seq_begin != seq_end)
>                 goto bad_read;
>         <use data>
>
> Ofcourse, barriers are not yet in BPF, but you get the idea (it should work on
> x86). The updater BPF program will increment v->seq before and after xchg,
> ensuring proper ordering. v->seq starts as 0, so odd seq indicates writer update
> is in progress.
>
> This would allow you to not raise refcount, while still ensuring that as long as
> object was accessed, it was still valid between A and B. Even if raising
> uncontended refcount is cheap, this is much cheaper.
>
> The case of userspace pointer is different, it sets the MEM_USER flag, so the
> only useful thing to do is calling bpf_probe_read_user, you can't even
> dereference it. You are right that in most cases that userspace pointer won't be
> useful, but for some cooperative cases between BPF program and userspace thread,
> it can act as a way to share certain thread local areas/userspace memory that
> the BPF program can then store keyed by the task_struct *, where using a BPF map
> to share memory is not always possible.

Thanks for the explanation! I can see the referenced kernel pointer be very
powerful in many use cases. The per cpu pointer is also interesting.

Song