On 12/7/22 7:47 PM, Alexei Starovoitov wrote: > On Wed, Dec 07, 2022 at 06:39:38PM -0500, Dave Marchevsky wrote: >>>> >>>> 0000000000000000 <less>: >>>> ; return node_a->key < node_b->key; >>>> 0: 79 22 f0 ff 00 00 00 00 r2 = *(u64 *)(r2 - 0x10) >>>> 1: 79 11 f0 ff 00 00 00 00 r1 = *(u64 *)(r1 - 0x10) >>>> 2: b4 00 00 00 01 00 00 00 w0 = 0x1 >>>> ; return node_a->key < node_b->key; >>> >>> I see. That's the same bug. >>> The args to callback should have been PTR_TO_BTF_ID | PTR_TRUSTED with >>> correct positive offset. >>> Then node_a = container_of(a, struct node_data, node); >>> would have produced correct offset into proper btf_id. >>> >>> The verifier should be passing into less() the btf_id >>> of struct node_data instead of btf_id of struct bpf_rb_node. >>> >> >> The verifier is already passing the struct node_data type, not bpf_rb_node. >> For less() args, and rbtree_{first,remove} retval, mark_reg_datastructure_node >> - added in patch 8 - is doing as you describe. >> >> Verifier sees less' arg regs as R=ptr_to_node_data(off=16). If it was >> instead passing R=ptr_to_bpf_rb_node(off=0), attempting to access *(reg - 0x10) >> would cause verifier err. > > Ahh. I finally got it :) > Please put these details in the commit log when you respin. > Glad it finally started making sense. Will do big improvement of patch summary after addressing other feedback from this series. >>>> 3: cd 21 01 00 00 00 00 00 if r1 s< r2 goto +0x1 <LBB2_2> >>>> 4: b4 00 00 00 00 00 00 00 w0 = 0x0 >>>> >>>> 0000000000000028 <LBB2_2>: >>>> ; return node_a->key < node_b->key; >>>> 5: 95 00 00 00 00 00 00 00 exit >>>> >>>> Insns 0 and 1 are loading node_b->key and node_a->key, respectively, using >>>> negative insn->off. Verifier's view or R1 and R2 before insn 0 is >>>> untrusted_ptr_node_data(off=16). If there were some intermediate insns >>>> storing result of container_of() before dereferencing: >>>> >>>> r3 = (r2 - 0x10) >>>> r2 = *(u64 *)(r3) >>>> >>>> Verifier would see R3 as untrusted_ptr_node_data(off=0), and load for >>>> r2 would have insn->off = 0. But LLVM decides to just do a load-with-offset >>>> using original arg ptrs to less() instead of storing container_of() ptr >>>> adjustments. >>>> >>>> Since the container_of usage and code pattern in above example's less() >>>> isn't particularly specific to this series, I think there are other scenarios >>>> where such code would be generated and considered this a general bugfix in >>>> cover letter. >>> >>> imo the negative offset looks specific to two misuses of PTR_UNTRUSTED in this set. >>> >> >> If I used PTR_TRUSTED here, the JITted instructions would still do a load like >> r2 = *(u64 *)(r2 - 0x10). There would just be no BPF_PROBE_MEM runtime checking >> insns generated, avoiding negative insn issue there. But the negative insn->off >> load being generated is not specific to PTR_UNTRUSTED. > > yep. > >>> >>> Exactly. More flags will only increase the confusion. >>> Please try to make callback args as proper PTR_TRUSTED and disallow calling specific >>> rbtree kfuncs while inside this particular callback to prevent recursion. >>> That would solve all these issues, no? >>> Writing into such PTR_TRUSTED should be still allowed inside cb though it's bogus. >>> >>> Consider less() receiving btf_id ptr_trusted of struct node_data and it contains >>> both link list and rbtree. >>> It should still be safe to operate on link list part of that node from less() >>> though it's not something we would ever recommend. >> >> I definitely want to allow writes on non-owning references. In order to properly >> support this, there needs to be a way to designate a field as a "key": >> >> struct node_data { >> long key __key; >> long data; >> struct bpf_rb_node node; >> }; >> >> or perhaps on the rb_root via __contains or separate tag: >> >> struct bpf_rb_root groot __contains(struct node_data, node, key); >> >> This is necessary because rbtree's less() uses key field to determine order, so >> we don't want to allow write to the key field when the node is in a rbtree. If >> such a write were possible the rbtree could easily be placed in an invalid state >> since the new key may mean that the rbtree is no longer sorted. Subsequent add() >> operations would compare less() using the new key, so other nodes will be placed >> in wrong spot as well. >> >> Since PTR_UNTRUSTED currently allows read but not write, and prevents use of >> non-owning ref as kfunc arg, it seemed to be reasonable tag for less() args. >> >> I was planning on adding __key / non-owning-ref write support as a followup, but >> adding it as part of this series will probably save a lot of back-and-forth. >> Will try to add it. > > Just key mark might not be enough. less() could be doing all sort of complex > logic on more than one field and even global fields. > But what is the concern with writing into 'key' ? > The rbtree will not be sorted. find/add operation will not be correct, > but nothing will crash. At the end bpf_rb_root_free() will walk all > unsorted nodes anyway and free them all. > Even if we pass PTR_TRUSTED | MEM_RDONLY pointers into less() the less() > can still do nonsensical things like returning random true/false. > Doesn't look like an issue to me. Agreed re: complex logic + global fields, less() being able to do nonsensical things, and writing to key not crashing anything even if it breaks the tree. OK, let's forget about __key. In next version of the series non-owning refs will be write-able. Can add more protection in the future if it's deemed necessary. Since this means non-owning refs won't be PTR_UNTRUSTED anymore, I can split this patch out from the rest of the series after confirming that it isn't necessary to ship rbtree. Still want to convince you that the skipping of a check is correct before I page out the details, but less urgent now. IIUC although the cause of the issue is clear now, you'd still like me to clarify the details of solution.