On Tue, Feb 22, 2022 at 12:21 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > [...] > > > > I guess I missed some context here. Could you please provide some reference > > to the use cases of these features? > > > > The common usecase is caching references to objects inside BPF maps, to avoid > costly lookups, and being able to raise it once for the duration of program > invocation when passing it to multiple helpers (to avoid further re-lookups). > Storing references also allows you to control object lifetime. > > One other use case is enabling xdp_frame queueing in XDP using this, but that > still needs some integration work after this lands, so it's a bit early to > comment on the specifics. > > Other than that, I think Alexei already mentioned this could be easily extended > to do memory allocation returning a PTR_TO_BTF_ID in a BPF program [0] in the > future. > > [0]: https://lore.kernel.org/bpf/20220216230615.po6huyrgkswk7u67@xxxxxxxxxxxxxxxxxxxxxxxxxxxx > > > For Unreferenced kernel pointer and userspace pointer, it seems that there is > > no guarantee the pointer will still be valid during access (we only know it is > > valid when it is stored in the map). Is this correct? > > > > That is correct. In the case of unreferenced and referenced kernel pointers, > when you do a BPF_LDX, both are marked as PTR_UNTRUSTED, and it is not allowed > to pass them into helpers or kfuncs, because from that point onwards we cannot > claim that the object is still alive when pointer is used later. Still, > dereference is permitted because verifier handles faults for bad accesses using > PROBE_MEM conversion for PTR_TO_BTF_ID loads in convert_ctx_accesses (which is > then later detected by JIT to build exception table used by exception handler). > > In case of reading unreferenced pointer, in some cases you know that the pointer > will stay valid, so you can just store it in the map and load and directly > access it, it imposes very little restrictions. > > For the referenced case, and BPF_LDX marking it as PTR_UNTRUSTED, you could say > that this makes it a lot less useful, because if BPF program already holds > reference, just to make sure I _read valid data_, I still have to use the > kptr_get style helper to raise and put reference to ensure the object is alive > when it is accessed. > > So in that case, for RCU protected objects, it should still wait for BPF program > to hit BPF_EXIT before the actual release, but for other cases like the case of > sleepable programs, or objects where refcount alone manages lifetime, you can > also detect writer presence of the other BPF program (to detect if pointer > during our access was xchg'd out) using a seqlock style scheme: > > v = bpf_map_lookup_elem(&map, ...); > if (!v) > return 0; > seq_begin = v->seq; > atomic_thread_fence(memory_order_acquire); // A > <do access> > atomic_thread_fence(memory_order_acquire); // B > seq_end = v->seq; > if (seq_begin & 1 || seq_begin != seq_end) > goto bad_read; > <use data> > > Ofcourse, barriers are not yet in BPF, but you get the idea (it should work on > x86). The updater BPF program will increment v->seq before and after xchg, > ensuring proper ordering. v->seq starts as 0, so odd seq indicates writer update > is in progress. > > This would allow you to not raise refcount, while still ensuring that as long as > object was accessed, it was still valid between A and B. Even if raising > uncontended refcount is cheap, this is much cheaper. > > The case of userspace pointer is different, it sets the MEM_USER flag, so the > only useful thing to do is calling bpf_probe_read_user, you can't even > dereference it. You are right that in most cases that userspace pointer won't be > useful, but for some cooperative cases between BPF program and userspace thread, > it can act as a way to share certain thread local areas/userspace memory that > the BPF program can then store keyed by the task_struct *, where using a BPF map > to share memory is not always possible. Thanks for the explanation! I can see the referenced kernel pointer be very powerful in many use cases. The per cpu pointer is also interesting. Song