On Thu, Sep 14, 2023 at 10:46 PM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > > Hello. > > 在 2023/9/15 07:26, Andrii Nakryiko 写道: > > On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > >> > >> > >> > >> 在 2023/9/13 21:53, Chuyi Zhou 写道: > >>> Hello. > >>> > >>> 在 2023/9/12 15:01, Chuyi Zhou 写道: > >>>> css_iter and process_iter should be used in rcu section. Specifically, in > >>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these > >>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to > >>>> use them directly. > >>>> > >>>> This patch checks whether we are in rcu cs before we want to invoke > >>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in > >>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would > >>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will > >>>> reject if reg->type is UNTRUSTED. > >>> > >>> I use the following BPF Prog to test this patch: > >>> > >>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") > >>> int iter_task_for_each_sleep(void *ctx) > >>> { > >>> struct task_struct *task; > >>> struct task_struct *cur_task = bpf_get_current_task_btf(); > >>> > >>> if (cur_task->pid != target_pid) > >>> return 0; > >>> bpf_rcu_read_lock(); > >>> bpf_for_each(process, task) { > >>> bpf_rcu_read_unlock(); > >>> if (task->pid == target_pid) > >>> process_cnt += 1; > >>> bpf_rcu_read_lock(); > >>> } > >>> bpf_rcu_read_unlock(); > >>> return 0; > >>> } > >>> > >>> Unfortunately, we can pass the verifier. > >>> > >>> Then I add some printk-messages before setting/clearing state to help > >>> debug: > >>> > >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >>> index d151e6b43a5f..35f3fa9471a9 100644 > >>> --- a/kernel/bpf/verifier.c > >>> +++ b/kernel/bpf/verifier.c > >>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct > >>> bpf_verifier_env *env, > >>> __mark_reg_known_zero(st); > >>> st->type = PTR_TO_STACK; /* we don't have dedicated reg > >>> type */ > >>> if (is_iter_need_rcu(meta)) { > >>> + printk("mark reg_addr : %px", st); > >>> if (in_rcu_cs(env)) > >>> st->type |= MEM_RCU; > >>> else > >>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct > >>> bpf_verifier_env *env, struct bpf_insn *insn, > >>> return -EINVAL; > >>> } else if (rcu_unlock) { > >>> bpf_for_each_reg_in_vstate(env->cur_state, > >>> state, reg, ({ > >>> + printk("clear reg_addr : %px MEM_RCU : > >>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & > >>> PTR_UNTRUSTED); > >>> if (reg->type & MEM_RCU) { > >>> - printk("clear reg addr : %lld", > >>> reg); > >>> reg->type &= ~(MEM_RCU | > >>> PTR_MAYBE_NULL); > >>> reg->type |= PTR_UNTRUSTED; > >>> } > >>> > >>> > >>> The demsg log: > >>> > >>> [ 393.705324] mark reg_addr : ffff88814e40e200 > >>> > >>> [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 > >>> PTR_UNTRUSTED : 0 > >>> > >>> [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 > >>> PTR_UNTRUSTED : 0 > >>> > >>> [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 > >>> PTR_UNTRUSTED : 0 > >>> .... > >>> .... > >>> > >>> I didn't see ffff88814e40e200 is cleared as expected because > >>> bpf_for_each_reg_in_vstate didn't find it. > >>> > >>> It seems when we are doing bpf_read_unlock() in the middle of iteration > >>> and want to clearing state through bpf_for_each_reg_in_vstate, we can > >>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in > >>> mark_stack_slots_iter(). > >>> > >> > >> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in > >> mark_stack_slots_iter() we has marked the slots *STACK_ITER* > >> > >> With the following change, everything seems work OK. > >> > >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > >> index a3236651ec64..83c5ecccadb4 100644 > >> --- a/include/linux/bpf_verifier.h > >> +++ b/include/linux/bpf_verifier.h > >> @@ -387,7 +387,7 @@ struct bpf_verifier_state { > >> > >> #define bpf_get_spilled_reg(slot, frame) \ > >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ > >> - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ > >> + (frame->stack[slot].slot_type[0] == STACK_SPILL || > >> frame->stack[slot].slot_type[0] == STACK_ITER)) \ > >> ? &frame->stack[slot].spilled_ptr : NULL) > >> > >> I am not sure whether this would harm some logic implicitly when using > >> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, > >> maybe we should add a extra parameter to control the picking behaviour. > >> > >> #define bpf_get_spilled_reg(slot, frame, stack_type) > >> \ > >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ > >> (frame->stack[slot].slot_type[0] == stack_type)) \ > >> ? &frame->stack[slot].spilled_ptr : NULL) > >> > >> Thanks. > > > > I don't think it's safe to just make bpf_get_spilled_reg, and > > subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg > > just suddenly start iterating iterator states and/or dynptrs. At least > > some of existing uses of those assume they are really working just > > with registers. > > IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of > reg including STACK_ITER. > > Maybe here we only need change the logic when using > bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep > everything else unchanged ? Right, maybe. I see 10 uses of bpf_for_each_reg_in_vstate() in kernel/bpf/verifier.c. Before we change the definition of bpf_for_each_reg_in_vstate() we should validate that iterating dynptr and iter states doesn't break any of them, that's all. > > Thanks.