Re: [PATCH bpf-next 10/13] bpf, x86: BPF_PROBE_MEM handling for insn->off < 0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/7/22 7:47 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 06:39:38PM -0500, Dave Marchevsky wrote:
>>>>
>>>> 0000000000000000 <less>:
>>>> ;       return node_a->key < node_b->key;
>>>>        0:       79 22 f0 ff 00 00 00 00 r2 = *(u64 *)(r2 - 0x10)
>>>>        1:       79 11 f0 ff 00 00 00 00 r1 = *(u64 *)(r1 - 0x10)
>>>>        2:       b4 00 00 00 01 00 00 00 w0 = 0x1
>>>> ;       return node_a->key < node_b->key;
>>>
>>> I see. That's the same bug.
>>> The args to callback should have been PTR_TO_BTF_ID | PTR_TRUSTED with 
>>> correct positive offset.
>>> Then node_a = container_of(a, struct node_data, node);
>>> would have produced correct offset into proper btf_id.
>>>
>>> The verifier should be passing into less() the btf_id
>>> of struct node_data instead of btf_id of struct bpf_rb_node.
>>>
>>
>> The verifier is already passing the struct node_data type, not bpf_rb_node.
>> For less() args, and rbtree_{first,remove} retval, mark_reg_datastructure_node
>> - added in patch 8 - is doing as you describe.
>>
>> Verifier sees less' arg regs as R=ptr_to_node_data(off=16). If it was
>> instead passing R=ptr_to_bpf_rb_node(off=0), attempting to access *(reg - 0x10)
>> would cause verifier err.
> 
> Ahh. I finally got it :)
> Please put these details in the commit log when you respin.
> 

Glad it finally started making sense.
Will do big improvement of patch summary after addressing other
feedback from this series.

>>>>        3:       cd 21 01 00 00 00 00 00 if r1 s< r2 goto +0x1 <LBB2_2>
>>>>        4:       b4 00 00 00 00 00 00 00 w0 = 0x0
>>>>
>>>> 0000000000000028 <LBB2_2>:
>>>> ;       return node_a->key < node_b->key;
>>>>        5:       95 00 00 00 00 00 00 00 exit
>>>>
>>>> Insns 0 and 1 are loading node_b->key and node_a->key, respectively, using
>>>> negative insn->off. Verifier's view or R1 and R2 before insn 0 is
>>>> untrusted_ptr_node_data(off=16). If there were some intermediate insns
>>>> storing result of container_of() before dereferencing:
>>>>
>>>>   r3 = (r2 - 0x10)
>>>>   r2 = *(u64 *)(r3)
>>>>
>>>> Verifier would see R3 as untrusted_ptr_node_data(off=0), and load for
>>>> r2 would have insn->off = 0. But LLVM decides to just do a load-with-offset
>>>> using original arg ptrs to less() instead of storing container_of() ptr
>>>> adjustments.
>>>>
>>>> Since the container_of usage and code pattern in above example's less()
>>>> isn't particularly specific to this series, I think there are other scenarios
>>>> where such code would be generated and considered this a general bugfix in
>>>> cover letter.
>>>
>>> imo the negative offset looks specific to two misuses of PTR_UNTRUSTED in this set.
>>>
>>
>> If I used PTR_TRUSTED here, the JITted instructions would still do a load like
>> r2 = *(u64 *)(r2 - 0x10). There would just be no BPF_PROBE_MEM runtime checking
>> insns generated, avoiding negative insn issue there. But the negative insn->off
>> load being generated is not specific to PTR_UNTRUSTED.
> 
> yep.
> 
>>>
>>> Exactly. More flags will only increase the confusion.
>>> Please try to make callback args as proper PTR_TRUSTED and disallow calling specific
>>> rbtree kfuncs while inside this particular callback to prevent recursion.
>>> That would solve all these issues, no?
>>> Writing into such PTR_TRUSTED should be still allowed inside cb though it's bogus.
>>>
>>> Consider less() receiving btf_id ptr_trusted of struct node_data and it contains
>>> both link list and rbtree.
>>> It should still be safe to operate on link list part of that node from less()
>>> though it's not something we would ever recommend.
>>
>> I definitely want to allow writes on non-owning references. In order to properly
>> support this, there needs to be a way to designate a field as a "key":
>>
>> struct node_data {
>>   long key __key;
>>   long data;
>>   struct bpf_rb_node node;
>> };
>>
>> or perhaps on the rb_root via __contains or separate tag:
>>
>> struct bpf_rb_root groot __contains(struct node_data, node, key);
>>
>> This is necessary because rbtree's less() uses key field to determine order, so
>> we don't want to allow write to the key field when the node is in a rbtree. If
>> such a write were possible the rbtree could easily be placed in an invalid state
>> since the new key may mean that the rbtree is no longer sorted. Subsequent add()
>> operations would compare less() using the new key, so other nodes will be placed
>> in wrong spot as well.
>>
>> Since PTR_UNTRUSTED currently allows read but not write, and prevents use of
>> non-owning ref as kfunc arg, it seemed to be reasonable tag for less() args.
>>
>> I was planning on adding __key / non-owning-ref write support as a followup, but
>> adding it as part of this series will probably save a lot of back-and-forth.
>> Will try to add it.
> 
> Just key mark might not be enough. less() could be doing all sort of complex
> logic on more than one field and even global fields.
> But what is the concern with writing into 'key' ?
> The rbtree will not be sorted. find/add operation will not be correct,
> but nothing will crash. At the end bpf_rb_root_free() will walk all
> unsorted nodes anyway and free them all.
> Even if we pass PTR_TRUSTED | MEM_RDONLY pointers into less() the less()
> can still do nonsensical things like returning random true/false.
> Doesn't look like an issue to me.

Agreed re: complex logic + global fields, less() being able to do nonsensical
things, and writing to key not crashing anything even if it breaks the tree.

OK, let's forget about __key. In next version of the series non-owning refs
will be write-able. Can add more protection in the future if it's deemed
necessary. Since this means non-owning refs won't be PTR_UNTRUSTED anymore,
I can split this patch out from the rest of the series after confirming that
it isn't necessary to ship rbtree.

Still want to convince you that the skipping of a check is correct before
I page out the details, but less urgent now. IIUC although the cause of the
issue is clear now, you'd still like me to clarify the details of solution.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux