Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/21/22 6:03 PM, John Fastabend wrote:
Yonghong Song wrote:


On 11/21/22 2:56 PM, Martin KaFai Lau wrote:
On 11/21/22 12:01 PM, Yonghong Song wrote:


On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
On 11/21/22 9:05 AM, Yonghong Song wrote:
@@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct
bpf_verifier_env *env,
           return -EACCES;
       }
+    /* Access rcu protected memory */
+    if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
+        !env->cur_state->active_rcu_lock) {
+        verbose(env,
+            "R%d is ptr_%s access rcu-protected memory with off=%d,
not rcu protected\n",
+            regno, tname, off);
+        return -EACCES;
+    }
+
       if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
           if (!btf_is_kernel(reg->btf)) {
               verbose(env, "verifier internal error: reg->btf must
be kernel btf\n");
@@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct
bpf_verifier_env *env,
       if (ret < 0)
           return ret;
+    /* The value is a rcu pointer. The load needs to be in a rcu
lock region,
+     * similar to rcu_dereference().
+     */
+    if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
!env->cur_state->active_rcu_lock) {
+        verbose(env,
+            "R%d is rcu dereference ptr_%s with off=%d, not in
rcu_read_lock region\n",
+            regno, tname, off);
+        return -EACCES;
+    }

Would this make the existing rdonly use case fail?

SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
int task_real_parent(void *ctx)
{
      struct task_struct *task, *real_parent;

      task = bpf_get_current_task_btf();
          real_parent = task->real_parent;
          bpf_printk("pid %u\n", real_parent->pid);
          return 0;
}

Right, it will fail. To fix the issue, user can do
     bpf_rcu_read_lock();
     real_parent = task->real_parent;
     bpf_printk("pid %u\n", real_parent->pid);
     bpf_rcu_read_unlock();

But this raised a good question. How do we deal with
legacy sleepable programs with newly-added rcu tagging
capabilities.

My current option is to error out if rcu usage is not right.
But this might break existing sleepable programs.

Another option intends to not break existing, like above,
codes. In this case, MEM_RCU will not tagged if it is
not inside bpf_rcu_read_lock() region.

hmm.... it is to make MEM_RCU to mean a reg is protected by the current
active_rcu_lock or not?

Yes, for example, in 'real_parent = task->real_parent' where
'real_parent' in task_struct is tagged with __rcu in the struct
definition. So the 'real_parent' variable in the above assignment
will be tagged with MEM_RCU.


In this case, the above non-rcu-protected code should work. And the
following should work as well although it is a little
bit awkward.
     real_parent = task->real_parent; // real_parent not tagged with rcu
     bpf_rcu_read_lock();
     bpf_printk("pid %u\n", real_parent->pid);
     bpf_rcu_read_unlock();

I think it should be fine.  bpf_rcu_read_lock() just not useful in this
example but nothing break or crash.  Also, after bpf_rcu_read_unlock(),
real_parent will continue to be readable because the MEM_RCU is not set?

That is correct. the variable real_parent is not tagged with MEM_RCU
and it will stay that way for the rest of its life cycle.

With new PTR_TRUSTED mechanism, real_parent will be marked as normal
PTR_TO_BTF_ID and it is not marked as PTR_UNTRUSTED for backward
compatibility. So in the above code, real_parent->pid is just a normal
load (not related to rcu/trusted/untrusted). People may think it
is okay, but actually it does not okay. Verifier could add more state
to issue proper warnings, but I am not sure whether it is worthwhile
or not. As you mentioned, nothing breaks. It is just the current
existing way. So we should be able to live with this.


On top of the active_rcu_lock, should MEM_RCU be set only when it is
dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?

I didn't consider PTR_TRUSTED because it is just introduced yesterday...

My current implementation inherits the old ptr_to_btf_id way where by
default any ptr_to_btf_id is trusted. But since we have PTR_TRUSTED
we should be able to use it for a stronger guarantee.

I am thinking about the following more common case:

      /* bpf_get_current_task_btf() may need to be changed
       * to set PTR_TRUSTED at the retval?
       */
      /* task: PTR_TO_BTF_ID | PTR_TRUSTED */
      task = bpf_get_current_task_btf();

      bpf_rcu_read_lock();

      /* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
      real_parent = task->real_parent;

          /* bpf_task_acquire() needs to change to use
refcount_inc_not_zero */
      real_parent = bpf_task_acquire(real_parent);

      bpf_rcu_read_unlock();

      /* real_parent is accessible here (after checking NULL) and
       * can be passed to kfunc
       */


Yes, the above is a typical use case. Or alternatively after
      real_parent = task->real_parent;
      /* use real_parent inside the bpf_rcu_read_lock() region */

I will try to utilize PTR_TRUSTED concept in the next revision.

Also perhaps interesting is when task is read out of a map
with reference already pinned. I think you should clear
the MEM_RCU tag on all referenced objects?

The register tagged with MEM_RCU will not be a referenced obj.
MEM_RCU tag only appears to a register inside the rcu read lock
region as the rcu_reference() result. So the obj tagged with
MEM_RCU is protected with rcu read lock and it is valid and
trusted and there is no need to acquire additional reference.
If user calls another kfunc to acquire a reference, then
the resulted ptr will not have MEM_RCU tag but with non-zero
ref_obj_id.

The MEM_RCU reg will be invalidated when seeing bpf_rcu_read_unlock()
to prevent rcu-protected ptr to leak out of the rcu read lock region.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux