Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/21/23 10:47 PM, David Marchevsky wrote:
On 8/21/23 10:37 PM, Yonghong Song wrote:


On 8/21/23 12:33 PM, Dave Marchevsky wrote:
An earlier patch in the series ensures that the underlying memory of
nodes with bpf_refcount - which can have multiple owners - is not reused
until RCU grace period has elapsed. This prevents
use-after-free with non-owning references that may point to
recently-freed memory. While RCU read lock is held, it's safe to
dereference such a non-owning ref, as by definition RCU GP couldn't have
elapsed and therefore underlying memory couldn't have been reused.

  From the perspective of verifier "trustedness" non-owning refs to
refcounted nodes are now trusted only in RCU CS and therefore should no
longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
MEM_RCU in order to reflect this new state.

Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
---
   include/linux/bpf.h   |  3 ++-
   kernel/bpf/verifier.c | 13 ++++++++++++-
   2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eced6400f778..12596af59c00 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -653,7 +653,8 @@ enum bpf_type_flag {
       MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
         /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
-     * Currently only valid for linked-list and rbtree nodes.
+     * Currently only valid for linked-list and rbtree nodes. If the nodes
+     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
        */
       NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
   diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8db0afa5985c..55607ab30522 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
       case PTR_TO_BTF_ID | PTR_TRUSTED:
       case PTR_TO_BTF_ID | MEM_RCU:
       case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
+    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
           /* When referenced PTR_TO_BTF_ID is passed to release function,
            * its fixed offset must be 0. In the other cases, fixed offset
            * can be non-zero. This was already checked above. So pass
@@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
   {
       struct bpf_verifier_state *state = env->cur_state;
+    struct btf_record *rec = reg_btf_record(reg);
         if (!state->active_lock.ptr) {
           verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
@@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
       }
         reg->type |= NON_OWN_REF;
+    if (rec->refcount_off >= 0)
+        reg->type |= MEM_RCU;

Should the above MEM_RCU marking be done unless reg access is in
rcu critical section?

I think it is fine, since non-owning references currently exist only within
spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
disabled + spin_lock CS should imply RCU CS.

   [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/

Okay, I see. Thanks for the pointer.
If MEM_RCU is marked not in explicit rcu cs, it does make sense
to clobber everything when bpf_spin_unlock is hit.
But see below.



I think we still have issues for state resetting
with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
will try to convert the reg state to PTR_UNTRUSTED.

Let us say reg state is
   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU

(1). If hitting bpf_spin_unlock(), since MEM_RCU is in
the reg state, the state should become
   PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
some additional code might be needed so we wont have
verifier complaints about ref_obj_id == 0.

(2). If hitting bpf_rcu_read_unlock(), the state should become
   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
since register access still in bpf_spin_lock() region.

I agree w/ your comment in side reply stating that this
case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
is currently not allowed.


Does this make sense?



IIUC the specific reg state flow you're recommending is based on the convos
we've had over the past few weeks re: getting rid of special non-owning ref
lifetime rules, instead using RCU as much as possible. Specifically, this
recommended change would remove non-owning ref clobbering, instead just removing
NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
to collection kfuncs (refcount_acquire, etc).

I agree that in general we'll be able to loosen the lifetime logic for
non-owning refs, and your specific suggestion sounds reasonable. IMO it's
better to ship that as a separate series, though, as this series was meant
to be the minimum changes necessary to re-enable bpf_refcount_acquire, and
it's expanded a bit past that already. Easier to reason about the rest
of this series' changes without having to account for clobbering changes.

I removed
  >>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
and
  >>> +    if (rec->refcount_off >= 0)
  >>> +        reg->type |= MEM_RCU;

All 'refcount' selftests seem working fine.
If this is indeed unnecessary for this patch set, maybe we can delay
this RCU marking thing for a future patch set?


+
       return 0;
   }
   @@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
           struct bpf_func_state *state;
           struct bpf_reg_state *reg;
   +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
+            verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
+            return -EACCES;
+        }
+
           if (rcu_lock) {
               verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
               return -EINVAL;
@@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env)
                       return -EINVAL;
                   }
   -                if (env->cur_state->active_rcu_lock) {
+                if (env->cur_state->active_rcu_lock &&
+                    !in_rbtree_lock_required_cb(env)) {
                       verbose(env, "bpf_rcu_read_unlock is missing\n");
                       return -EINVAL;
                   }




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux