Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2/15/2023 9:54 AM, Martin KaFai Lau wrote:
> On 2/11/23 8:34 AM, Alexei Starovoitov wrote:
>> On Sat, Feb 11, 2023 at 8:33 AM Alexei Starovoitov
>> <alexei.starovoitov@xxxxxxxxx> wrote:
>>>
>>> On Fri, Feb 10, 2023 at 5:10 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>>>>> Hou, are you plannning to resubmit this change? I also hit this while
>>>>>> testing my
>>>>>> changes on bpf-next.
>>>>> Are you talking about the whole patch set or just GFP_ZERO in mem_alloc?
>>>>> The former will take a long time to settle.
>>>>> The latter is trivial.
>>>>> To unblock yourself just add GFP_ZERO in an extra patch?
>>>> Sorry for the long delay. Just find find out time to do some tests to compare
>>>> the performance of bzero and ctor. After it is done, will resubmit on next
>>>> week.
>>>
>>> I still don't like ctor as a concept. In general the callbacks in the critical
>>> path are guaranteed to be slow due to retpoline overhead.
>>> Please send a patch to add GFP_ZERO.
>>>
>>> Also I realized that we can make the BPF_REUSE_AFTER_RCU_GP flag usable
>>> without risking OOM by only waiting for normal rcu GP and not rcu_tasks_trace.
>>> This approach will work for inner nodes of qptrie, since bpf progs
>>> never see pointers to them. It will work for local storage
>>> converted to bpf_mem_alloc too. It wouldn't need to use its own call_rcu.
>>> It's also safe without uaf caveat in sleepable progs and sleepable progs
>>
>> I meant 'safe with uaf caveat'.
>> Safe because we wait for rcu_task_trace later before returning to kernel memory.
For qp-trie, I had added reuse checking for qp-trie inner node by adding a
version in both child pointer and child node itself and it seemed works, but
using BPF_REUSE_AFTER_RCU_GP for inner node will be much simpler for the
implementation. And it seems for qp-trie leaf node, BPF_REUSE_AFTER_RCU_GP is
needed as well, else the value returned to caller in bpf program or syscall may
be reused just like hash-table during its usage. We can change qp-trie to act as
a set only (e.g., no value), but that will limit its usage scenario. Maybe
requiring the caller to use bpf_rcu_read_lock() is solution as well. What do you
think ?
>
> For local storage, when its owner (sk/task/inode/cgrp) is going away, the
> memory can be reused immediately. No rcu gp is needed.
Now it seems it will wait for RCU GP and i think it is still necessary, because
when the process exits, other processes may still access the local storage
through pidfd or task_struct of the exited process.
>
> The local storage delete case (eg. bpf_sk_storage_delete) is the only one that
> needs to be freed by tasks_trace gp because another bpf prog (reader) may be
> under the rcu_read_lock_trace(). I think the idea (BPF_REUSE_AFTER_RCU_GP) on
> allowing reuse after vanilla rcu gp and free (if needed) after tasks_trace gp
> can be extended to the local storage delete case. I think we can extend the
> assumption that "sleepable progs (reader) can use explicit bpf_rcu_read_lock()
> when they want to avoid uaf" to bpf_{sk,task,inode,cgrp}_storage_get() also.
>
It seems bpf_rcu_read_lock() & bpf_rcu_read_unlock() will be used to protect not
only bpf_task_storage_get(), but also the dereferences of the returned local
storage ptr, right ? I think qp-trie may also need this.
> I also need the GFP_ZERO in bpf_mem_alloc, so will work on the GFP_ZERO and
> the BPF_REUSE_AFTER_RCU_GP idea.  Probably will get the GFP_ZERO out first.
I will continue work on this patchset for GFP_ZERO and reuse flag. Do you mean
that you want to work together to implement BPF_REUSE_AFTER_RCU_GP ? How do we
cooperate together to accomplish that ?


>
>>
>>> can use explicit bpf_rcu_read_lock() when they want to avoid uaf.
>>> So please respin the set with rcu gp only and that new flag.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux