Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/22/23 4:45 PM, Alexei Starovoitov wrote:
On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote:
On 8/21/23 10:37 PM, Yonghong Song wrote:


On 8/21/23 12:33 PM, Dave Marchevsky wrote:
An earlier patch in the series ensures that the underlying memory of
nodes with bpf_refcount - which can have multiple owners - is not reused
until RCU grace period has elapsed. This prevents
use-after-free with non-owning references that may point to
recently-freed memory. While RCU read lock is held, it's safe to
dereference such a non-owning ref, as by definition RCU GP couldn't have
elapsed and therefore underlying memory couldn't have been reused.

  From the perspective of verifier "trustedness" non-owning refs to
refcounted nodes are now trusted only in RCU CS and therefore should no
longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
MEM_RCU in order to reflect this new state.

Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
---
   include/linux/bpf.h   |  3 ++-
   kernel/bpf/verifier.c | 13 ++++++++++++-
   2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eced6400f778..12596af59c00 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -653,7 +653,8 @@ enum bpf_type_flag {
       MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
         /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
-     * Currently only valid for linked-list and rbtree nodes.
+     * Currently only valid for linked-list and rbtree nodes. If the nodes
+     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
        */
       NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
   diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8db0afa5985c..55607ab30522 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
       case PTR_TO_BTF_ID | PTR_TRUSTED:
       case PTR_TO_BTF_ID | MEM_RCU:
       case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
+    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
           /* When referenced PTR_TO_BTF_ID is passed to release function,
            * its fixed offset must be 0. In the other cases, fixed offset
            * can be non-zero. This was already checked above. So pass
@@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
   {
       struct bpf_verifier_state *state = env->cur_state;
+    struct btf_record *rec = reg_btf_record(reg);
         if (!state->active_lock.ptr) {
           verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
@@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
       }
         reg->type |= NON_OWN_REF;
+    if (rec->refcount_off >= 0)
+        reg->type |= MEM_RCU;

Should the above MEM_RCU marking be done unless reg access is in
rcu critical section?

I think it is fine, since non-owning references currently exist only within
spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
disabled + spin_lock CS should imply RCU CS.

   [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/


I think we still have issues for state resetting
with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
will try to convert the reg state to PTR_UNTRUSTED.

Let us say reg state is
   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU

(1). If hitting bpf_spin_unlock(), since MEM_RCU is in
the reg state, the state should become
   PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
some additional code might be needed so we wont have
verifier complaints about ref_obj_id == 0.

(2). If hitting bpf_rcu_read_unlock(), the state should become
   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
since register access still in bpf_spin_lock() region.

I agree w/ your comment in side reply stating that this
case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
is currently not allowed.


Does this make sense?



IIUC the specific reg state flow you're recommending is based on the convos
we've had over the past few weeks re: getting rid of special non-owning ref
lifetime rules, instead using RCU as much as possible. Specifically, this
recommended change would remove non-owning ref clobbering, instead just removing
NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
to collection kfuncs (refcount_acquire, etc).

Overall the patch set makes sense to me, but I want to clarify above.
My understanding that after the patch set applied bpf_spin_unlock()
will invalidate_non_owning_refs(), so what Yonghong is saying in (1)
is not correct.
Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid().

I said it 'should become ...', but you are right. right now, it will do
mark_reg_invalid(). So it is correct just MAYBE a little conservative.


Re: (2) even if/when bpf_rcu_read_unlock() will allowed inside spinlocked region
it will convert PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU to
PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | PTR_UNTRUSTED
which is a buggy combination which we would need to address if rcu_unlock is allowed eventually.

Did I get it right?
If so I think the whole set is good to do.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux