On 11/4/22 3:42 AM, Kumar Kartikeya Dwivedi wrote: > On Fri, Nov 04, 2022 at 11:26:39AM IST, Dave Marchevsky wrote: >> On 11/3/22 3:10 PM, Kumar Kartikeya Dwivedi wrote: >>> Add a linked list API for use in BPF programs, where it expects >>> protection from the bpf_spin_lock in the same allocation as the >>> bpf_list_head. Future patches will extend the same infrastructure to >>> have different flavors with varying protection domains and visibility >>> (e.g. percpu variant with local_t protection, usable in NMI progs). >>> >>> The following functions are added to kick things off: >>> >>> bpf_list_push_front >>> bpf_list_push_back >>> bpf_list_pop_front >>> bpf_list_pop_back >>> >>> The lock protecting the bpf_list_head needs to be taken for all >>> operations. >>> >>> Once a node has been added to the list, it's pointer changes to >>> PTR_UNTRUSTED. However, it is only released once the lock protecting the >>> list is unlocked. For such local kptrs with PTR_UNTRUSTED set but an >>> active ref_obj_id, it is still permitted to read and write to them as >>> long as the lock is held. >> >> I think "still permitted to ... write to them" is not accurate >> for this version of the series. In v2 you mentioned [0]: >> >> """ >> I have switched things a bit to disallow stores, which is a bug right now in >> this set, because one can do this: >> >> push_front(head, &p->node); >> p2 = container_of(pop_front(head)); >> // p2 == p >> bpf_obj_drop(p2); >> p->data = ...; >> >> One can always fully initialize the object _before_ inserting it into the list, >> in some cases that will be the requirement (like adding to RCU protected lists) >> for correctness. >> """ >> >> I confirmed this is currently the case by moving data write after >> list_push in the selftest and running it: >> >> @@ -87,8 +87,8 @@ static __always_inline int list_push_pop(struct bpf_spin_lock *lock, >> } >> >> bpf_spin_lock(lock); >> - f->data = 13; >> bpf_list_push_front(head, &f->node); >> + f->data = 13; >> bpf_spin_unlock(lock); >> >> Got "only read is supported" from verifier. >> I think it's fine to punt on supporting writes for now and do it in followups. >> > > Thanks for catching it, I'll fix up the commit message. > > Also, just to manage the expectations I think enabling writes after pushing the > object won't be possible to make safe, unless the definition of "safe" is > twisted. > > As shown in that example, we can reach a point where it has been freed but we > hold an untrusted pointer to it. Once it has been freed the object can be > reallocated and be in use again concurrently, possibly as a different type. I think this patchset prevents that example from passing verifier, provided that bpf_spin_{lock,unlock} are added in the right places: p = bpf_obj_new(typeof(*p)); if (!p) return; bpf_spin_lock(lock); push_front(head, &p->node); p2 = container_of(pop_front(head)); // p2 == p if (!p2) return; bpf_spin_unlock(lock); bpf_obj_drop(p2); p->data = ...; After bpf_obj_new R0 is of type ptr_or_null_BTF_TYPE with nonzero ref_obj_id, let's say 1. After null check any regs holding p have _or_null removed. After push_front any regs w/ ref_obj_id==1 become untrusted and have release_on_unlock set. After pop_front R0 is ptr_or_null_BTF_TYPE with nonzero ref_obj_id different than bpf_obj_new result, let's say 2. Null check works same way as before. After bpf_spin_unlock all release_on_unlock references are clobbered, so p->data will result in "invalid mem access 'scalar'" verifier error. Verifier requires list manipulation functions to be in a critical section and doesn't allow bpf_obj_new or bpf_obj_drop in the critical section. p2 == p but having different ref_obj_ids is OK because only p2 will escape bpf_spin_unlock unclobbered to be released by bpf_obj_drop, and no free()s are allowed until we're out of the critical section. So by the time we can call bpf_obj_drop we only have one valid reference remaining ("trusted" one). My OBJ_NON_OWNING_REF stuff in rbtree patchsets had same logic, IIRC only difference is that my "add to datastructure" function was considered a RELEASE func that also did an ACQUIRE of a release_on_unlock retval. But this was before static lock checking discussions, so the RELEASE/ACQUIRE could fail. I think directly modifying arg reg like you're doing in set_release_on_unlock is cleaner. > I was contemplating whether to simply drop this whole set_release_on_unlock > logic entirely. Not sure it's worth the added complexity, atleast for now. Once > you push you simply lose ownership of the object and any registers are > immediately killed. I think that being able to read / modify the datastructure node after it's been added is pretty critical, at least from a UX perspective. Totally fine with it being dropped from the series and experimented with later, though.