Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 14 Sep 2023 16:26:49 -0700

On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
>
>
>
> 在 2023/9/13 21:53, Chuyi Zhou 写道:
> > Hello.
> >
> > 在 2023/9/12 15:01, Chuyi Zhou 写道:
> >> css_iter and process_iter should be used in rcu section. Specifically, in
> >> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> >> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> >> use them directly.
> >>
> >> This patch checks whether we are in rcu cs before we want to invoke
> >> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> >> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> >> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> >> reject if reg->type is UNTRUSTED.
> >
> > I use the following BPF Prog to test this patch:
> >
> > SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> > int iter_task_for_each_sleep(void *ctx)
> > {
> >      struct task_struct *task;
> >      struct task_struct *cur_task = bpf_get_current_task_btf();
> >
> >      if (cur_task->pid != target_pid)
> >          return 0;
> >      bpf_rcu_read_lock();
> >      bpf_for_each(process, task) {
> >          bpf_rcu_read_unlock();
> >          if (task->pid == target_pid)
> >              process_cnt += 1;
> >          bpf_rcu_read_lock();
> >      }
> >      bpf_rcu_read_unlock();
> >      return 0;
> > }
> >
> > Unfortunately, we can pass the verifier.
> >
> > Then I add some printk-messages before setting/clearing state to help
> > debug:
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index d151e6b43a5f..35f3fa9471a9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
> > bpf_verifier_env *env,
> >                  __mark_reg_known_zero(st);
> >                  st->type = PTR_TO_STACK; /* we don't have dedicated reg
> > type */
> >                  if (is_iter_need_rcu(meta)) {
> > +                       printk("mark reg_addr : %px", st);
> >                          if (in_rcu_cs(env))
> >                                  st->type |= MEM_RCU;
> >                          else
> > @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
> > bpf_verifier_env *env, struct bpf_insn *insn,
> >                          return -EINVAL;
> >                  } else if (rcu_unlock) {
> >                          bpf_for_each_reg_in_vstate(env->cur_state,
> > state, reg, ({
> > +                               printk("clear reg_addr : %px MEM_RCU :
> > %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
> > PTR_UNTRUSTED);
> >                                  if (reg->type & MEM_RCU) {
> > -                                       printk("clear reg addr : %lld",
> > reg);
> >                                          reg->type &= ~(MEM_RCU |
> > PTR_MAYBE_NULL);
> >                                          reg->type |= PTR_UNTRUSTED;
> >                                  }
> >
> >
> > The demsg log:
> >
> > [  393.705324] mark reg_addr : ffff88814e40e200
> >
> > [  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> >
> > [  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> >
> > [  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> > ....
> > ....
> >
> > I didn't see ffff88814e40e200 is cleared as expected because
> > bpf_for_each_reg_in_vstate didn't find it.
> >
> > It seems when we are doing bpf_read_unlock() in the middle of iteration
> > and want to clearing state through bpf_for_each_reg_in_vstate, we can
> > not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
> > mark_stack_slots_iter().
> >
>
> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
>
> With the following change, everything seems work OK.
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index a3236651ec64..83c5ecccadb4 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
>
>   #define bpf_get_spilled_reg(slot, frame)                               \
>          (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
> -         (frame->stack[slot].slot_type[0] == STACK_SPILL))             \
> +         (frame->stack[slot].slot_type[0] == STACK_SPILL ||
> frame->stack[slot].slot_type[0] == STACK_ITER))            \
>           ? &frame->stack[slot].spilled_ptr : NULL)
>
> I am not sure whether this would harm some logic implicitly when using
> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
> maybe we should add a extra parameter to control the picking behaviour.
>
> #define bpf_get_spilled_reg(slot, frame, stack_type)
>                         \
>         (((slot < frame->allocated_stack / BPF_REG_SIZE) &&             \
>           (frame->stack[slot].slot_type[0] == stack_type))              \
>          ? &frame->stack[slot].spilled_ptr : NULL)
>
> Thanks.

I don't think it's safe to just make bpf_get_spilled_reg, and
subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
just suddenly start iterating iterator states and/or dynptrs. At least
some of existing uses of those assume they are really working just
with registers.