Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yonghong Song wrote:
> 
> 
> On 11/21/22 2:56 PM, Martin KaFai Lau wrote:
> > On 11/21/22 12:01 PM, Yonghong Song wrote:
> >>
> >>
> >> On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
> >>> On 11/21/22 9:05 AM, Yonghong Song wrote:
> >>>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct 
> >>>> bpf_verifier_env *env,
> >>>>           return -EACCES;
> >>>>       }
> >>>> +    /* Access rcu protected memory */
> >>>> +    if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
> >>>> +        !env->cur_state->active_rcu_lock) {
> >>>> +        verbose(env,
> >>>> +            "R%d is ptr_%s access rcu-protected memory with off=%d, 
> >>>> not rcu protected\n",
> >>>> +            regno, tname, off);
> >>>> +        return -EACCES;
> >>>> +    }
> >>>> +
> >>>>       if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
> >>>>           if (!btf_is_kernel(reg->btf)) {
> >>>>               verbose(env, "verifier internal error: reg->btf must 
> >>>> be kernel btf\n");
> >>>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct 
> >>>> bpf_verifier_env *env,
> >>>>       if (ret < 0)
> >>>>           return ret;
> >>>> +    /* The value is a rcu pointer. The load needs to be in a rcu 
> >>>> lock region,
> >>>> +     * similar to rcu_dereference().
> >>>> +     */
> >>>> +    if ((flag & MEM_RCU) && env->prog->aux->sleepable && 
> >>>> !env->cur_state->active_rcu_lock) {
> >>>> +        verbose(env,
> >>>> +            "R%d is rcu dereference ptr_%s with off=%d, not in 
> >>>> rcu_read_lock region\n",
> >>>> +            regno, tname, off);
> >>>> +        return -EACCES;
> >>>> +    }
> >>>
> >>> Would this make the existing rdonly use case fail?
> >>>
> >>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
> >>> int task_real_parent(void *ctx)
> >>> {
> >>>      struct task_struct *task, *real_parent;
> >>>
> >>>      task = bpf_get_current_task_btf();
> >>>          real_parent = task->real_parent;
> >>>          bpf_printk("pid %u\n", real_parent->pid);
> >>>          return 0;
> >>> }
> >>
> >> Right, it will fail. To fix the issue, user can do
> >>     bpf_rcu_read_lock();
> >>     real_parent = task->real_parent;
> >>     bpf_printk("pid %u\n", real_parent->pid);
> >>     bpf_rcu_read_unlock();
> >>
> >> But this raised a good question. How do we deal with
> >> legacy sleepable programs with newly-added rcu tagging
> >> capabilities.
> >>
> >> My current option is to error out if rcu usage is not right.
> >> But this might break existing sleepable programs.
> >>
> >> Another option intends to not break existing, like above,
> >> codes. In this case, MEM_RCU will not tagged if it is
> >> not inside bpf_rcu_read_lock() region.
> > 
> > hmm.... it is to make MEM_RCU to mean a reg is protected by the current 
> > active_rcu_lock or not?
> 
> Yes, for example, in 'real_parent = task->real_parent' where
> 'real_parent' in task_struct is tagged with __rcu in the struct
> definition. So the 'real_parent' variable in the above assignment
> will be tagged with MEM_RCU.
> 
> > 
> >> In this case, the above non-rcu-protected code should work. And the
> >> following should work as well although it is a little
> >> bit awkward.
> >>     real_parent = task->real_parent; // real_parent not tagged with rcu
> >>     bpf_rcu_read_lock();
> >>     bpf_printk("pid %u\n", real_parent->pid);
> >>     bpf_rcu_read_unlock();
> > 
> > I think it should be fine.  bpf_rcu_read_lock() just not useful in this 
> > example but nothing break or crash.  Also, after bpf_rcu_read_unlock(), 
> > real_parent will continue to be readable because the MEM_RCU is not set?
> 
> That is correct. the variable real_parent is not tagged with MEM_RCU
> and it will stay that way for the rest of its life cycle.
> 
> With new PTR_TRUSTED mechanism, real_parent will be marked as normal
> PTR_TO_BTF_ID and it is not marked as PTR_UNTRUSTED for backward
> compatibility. So in the above code, real_parent->pid is just a normal
> load (not related to rcu/trusted/untrusted). People may think it
> is okay, but actually it does not okay. Verifier could add more state
> to issue proper warnings, but I am not sure whether it is worthwhile
> or not. As you mentioned, nothing breaks. It is just the current
> existing way. So we should be able to live with this.
> 
> > 
> > On top of the active_rcu_lock, should MEM_RCU be set only when it is 
> > dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?
> 
> I didn't consider PTR_TRUSTED because it is just introduced yesterday...
> 
> My current implementation inherits the old ptr_to_btf_id way where by
> default any ptr_to_btf_id is trusted. But since we have PTR_TRUSTED
> we should be able to use it for a stronger guarantee.
> 
> > I am thinking about the following more common case:
> > 
> >      /* bpf_get_current_task_btf() may need to be changed
> >       * to set PTR_TRUSTED at the retval?
> >       */
> >      /* task: PTR_TO_BTF_ID | PTR_TRUSTED */
> >      task = bpf_get_current_task_btf();
> > 
> >      bpf_rcu_read_lock();
> > 
> >      /* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
> >      real_parent = task->real_parent;
> > 
> >          /* bpf_task_acquire() needs to change to use 
> > refcount_inc_not_zero */
> >      real_parent = bpf_task_acquire(real_parent);
> > 
> >      bpf_rcu_read_unlock();
> > 
> >      /* real_parent is accessible here (after checking NULL) and
> >       * can be passed to kfunc
> >       */
> > 
> 
> Yes, the above is a typical use case. Or alternatively after
>      real_parent = task->real_parent;
>      /* use real_parent inside the bpf_rcu_read_lock() region */
> 
> I will try to utilize PTR_TRUSTED concept in the next revision.

Also perhaps interesting is when task is read out of a map
with reference already pinned. I think you should clear
the MEM_RCU tag on all referenced objects?




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux