Re: [PATCH bpf-next v4 22/24] bpf: Introduce single ownership BPF linked list API

Dave Marchevsky <davemarchevsky@xxxxxxxx> · Fri, 4 Nov 2022 22:15:00 -0400

On 11/4/22 3:42 AM, Kumar Kartikeya Dwivedi wrote:
> On Fri, Nov 04, 2022 at 11:26:39AM IST, Dave Marchevsky wrote:
>> On 11/3/22 3:10 PM, Kumar Kartikeya Dwivedi wrote:
>>> Add a linked list API for use in BPF programs, where it expects
>>> protection from the bpf_spin_lock in the same allocation as the
>>> bpf_list_head. Future patches will extend the same infrastructure to
>>> have different flavors with varying protection domains and visibility
>>> (e.g. percpu variant with local_t protection, usable in NMI progs).
>>>
>>> The following functions are added to kick things off:
>>>
>>> bpf_list_push_front
>>> bpf_list_push_back
>>> bpf_list_pop_front
>>> bpf_list_pop_back
>>>
>>> The lock protecting the bpf_list_head needs to be taken for all
>>> operations.
>>>
>>> Once a node has been added to the list, it's pointer changes to
>>> PTR_UNTRUSTED. However, it is only released once the lock protecting the
>>> list is unlocked. For such local kptrs with PTR_UNTRUSTED set but an
>>> active ref_obj_id, it is still permitted to read and write to them as
>>> long as the lock is held.
>>
>> I think "still permitted to ... write to them" is not accurate
>> for this version of the series. In v2 you mentioned [0]:
>>
>> """
>> I have switched things a bit to disallow stores, which is a bug right now in
>> this set, because one can do this:
>>
>> push_front(head, &p->node);
>> p2 = container_of(pop_front(head));
>> // p2 == p
>> bpf_obj_drop(p2);
>> p->data = ...;
>>
>> One can always fully initialize the object _before_ inserting it into the list,
>> in some cases that will be the requirement (like adding to RCU protected lists)
>> for correctness.
>> """
>>
>> I confirmed this is currently the case by moving data write after
>> list_push in the selftest and running it:
>>
>> @@ -87,8 +87,8 @@ static __always_inline int list_push_pop(struct bpf_spin_lock *lock,
>>         }
>>
>>         bpf_spin_lock(lock);
>> -       f->data = 13;
>>         bpf_list_push_front(head, &f->node);
>> +       f->data = 13;
>>         bpf_spin_unlock(lock);
>>
>> Got "only read is supported" from verifier.
>> I think it's fine to punt on supporting writes for now and do it in followups.
>>
> 
> Thanks for catching it, I'll fix up the commit message.
> 
> Also, just to manage the expectations I think enabling writes after pushing the
> object won't be possible to make safe, unless the definition of "safe" is
> twisted.
> 
> As shown in that example, we can reach a point where it has been freed but we
> hold an untrusted pointer to it. Once it has been freed the object can be
> reallocated and be in use again concurrently, possibly as a different type.
I think this patchset prevents that example from passing verifier, provided
that bpf_spin_{lock,unlock} are added in the right places:

  p = bpf_obj_new(typeof(*p));
  if (!p)
    return;

  bpf_spin_lock(lock);
  push_front(head, &p->node);
  p2 = container_of(pop_front(head));
  // p2 == p
  if (!p2)
    return;
  bpf_spin_unlock(lock);

  bpf_obj_drop(p2);
  p->data = ...;

After bpf_obj_new R0 is of type ptr_or_null_BTF_TYPE with nonzero ref_obj_id,
let's say 1. 
After null check any regs holding p have _or_null removed.

After push_front any regs w/ ref_obj_id==1 become untrusted and have
release_on_unlock set. After pop_front R0 is ptr_or_null_BTF_TYPE with nonzero
ref_obj_id different than bpf_obj_new result, let's say 2.
Null check works same way as before.

After bpf_spin_unlock all release_on_unlock references are clobbered, so
p->data will result in "invalid mem access 'scalar'" verifier error.

Verifier requires list manipulation functions to be in a critical section
and doesn't allow bpf_obj_new or bpf_obj_drop in the critical section.
p2 == p but having different ref_obj_ids is OK because only p2 will escape
bpf_spin_unlock unclobbered to be released by bpf_obj_drop, and no free()s
are allowed until we're out of the critical section. So by the time we can
call bpf_obj_drop we only have one valid reference remaining ("trusted" one).

My OBJ_NON_OWNING_REF stuff in rbtree patchsets had same logic, IIRC
only difference is that my "add to datastructure" function was considered
a RELEASE func that also did an ACQUIRE of a release_on_unlock retval. But this
was before static lock checking discussions, so the RELEASE/ACQUIRE could fail.
I think directly modifying arg reg like you're doing in set_release_on_unlock
is cleaner.

> I was contemplating whether to simply drop this whole set_release_on_unlock
> logic entirely. Not sure it's worth the added complexity, atleast for now. Once
> you push you simply lose ownership of the object and any registers are
> immediately killed.

I think that being able to read / modify the datastructure node after it's been 
added is pretty critical, at least from a UX perspective. 

Totally fine with it being dropped from the series and experimented with
later, though.