On Sun, Dec 19, 2021 at 11:13:18PM IST, Alexei Starovoitov wrote: > On Sat, Dec 18, 2021 at 9:25 PM Kumar Kartikeya Dwivedi > <memxor@xxxxxxxxx> wrote: > > > > On Sun, Dec 19, 2021 at 10:35:18AM IST, Alexei Starovoitov wrote: > > > On Sat, Dec 18, 2021 at 8:33 PM Kumar Kartikeya Dwivedi > > > <memxor@xxxxxxxxx> wrote: > > > > > > > > It is, but into parent_ref_obj_id, to match during release_reference. > > > > > > > > > Shouldn't r2 get a different ref_obj_id after r2 = r1->next ? > > > > > > > > It's ref_obj_id is still 0. > > > > > > > > Thinking about this more, we actually only need 1 extra bit of information in > > > > reg_state, not even a new member. We can simply copy ref_obj_id and set this > > > > bit, then we can reject this register during release but consider it during > > > > release_reference. > > > > > > It seems to me that this patch created the problem and it's trying > > > to fix it at the same time. > > > > > > > Yes, sort of. Maybe I need to improve the commit message? I give an example > > below, and the first half of commit explains that if we simply did copy > > ref_obj_id, it would lead to the case in the previous mail (same BTF ID ptr can > > be passed), so we need to do something different. > > > > Maybe that is what is confusing you. > > I'm still confused. > Why does mark_btf_ld_reg() need to copy ref_obj_id ? > It should keep it as zero. So that we can find deref pointers obtained from the reg having that ref_obj_id when it is released, and invalidate them. But since directly storing in ref_obj_id of deref dst_reg will be bad (if we get same BTF ID from deref we could now pass it to release kfunc), we add a new member and match on that. > mark_btf_ld_reg() is used in deref only. > The ref_obj_id is assigned by check_helper_call(). > r2 = r0; will copy it, but > r2 = r0->next; will keep r2->ref_obj_id as zero. > > > > mark_btf_ld_reg() shouldn't be copying ref_obj_id. > > > If it keeps it as zero the problem will not happen, no? > > > > It is copying it but writing it to parent_ref_obj_id. It keeps ref_obj_id as 0 > > for all deref pointers. > > > > r1 = acq(); // r1.ref = acquire_reference_state(); > > ref = N > > r2 = r1->a; // mark_btf_ld_reg -> copy r1.(ref ?: parent_ref) -> so r2.parent_ref = r1.ref > > r3 = r2->b; // mark_btf_ld_reg -> copy r2.(ref ?: parent_ref) -> so r3.parent_ref = r2.parent_ref > > r4 = r3->c; // mark_btf_ld_reg -> copy r3.(ref ?: parent_ref) -> so r4.parent_ref = r3.parent_ref > > rel(r1); // if (reg.ref == r1.ref || reg.parent_ref == r1.ref) invalidate(reg) > > > > As you see, mark_btf_ld_reg only ever writes to parent_ref_obj_id, not > > ref_obj_id. It just copies ref_obj_id when it is set, over parent_ref_obj_id, > > and only one of two can be set. > > I don't understand why such logic is needed. Ok, let me try to explain once how I arrived at it. If you still don't like it, I'll drop it from the series. So until this patch, when we do the following: struct nf_conn *ct = bpf_xdp_ct_lookup(...); if (ct) { struct nf_conn *master = ct->master; bpf_ct_release(ct); unsigned long status = master->status; // I want to prevent this } ... this will work, which is ok (as in won't crash the kernel) since the load will be converted to BPF_PROBE_MEM, but I want to disallow this case since it is obviously incorrect. The obvious solution (to me) was to kill all registers and stack slots for deref pointers. My first naive solution was to simply copy ref_obj_id on mark_btf_ld_reg, so that it can be matched and released from release_reference. But then I noticed that if the BTF ID is same, there is no difference when it is passed to release kfunc compared to the original register it was loaded from. struct nf_conn *ct = bpf_xdp_ct_lookup(...); if (ct) { struct nf_conn *master = ct->master; // copied ref_obj_id bpf_ct_release(master); // works, but shouldn't! } So the code needed some way to distinguish this deref pointer that must be invalidated only when its 'parent' goes away. Hence the introduction of parent_ref_obj_id, and the invariant that only one of ref_obj_id or parent_ref_obj_id must be set. Now release_reference becomes: kill all registers/stack slots with ref == ref_obj_id (directly acquired) and ref == parent_ref_obj_id (formed using deref chains (of arbitrary length)). Either a register is dst_reg of acquire kfunc, or deref of that reg, but can't be both. Only the first case can be passed to release kfunc (or a copy from mov), not the second (since it will always have invalid ref_obj_id == 0). If this is still not inspiring confidence, I'll drop the patch for now. -- Kartikeya