Re: [RFC bpf-next v3 3/6] bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 03, 2023 at 04:39:01PM -0700, Martin KaFai Lau wrote:
> On 5/3/23 4:06 PM, Alexei Starovoitov wrote:
> > On Wed, May 03, 2023 at 02:57:03PM -0700, Martin KaFai Lau wrote:
> > > On 5/3/23 11:48 AM, Alexei Starovoitov wrote:
> > > > What it means that sleepable progs using hashmap will be able to avoid uaf with bpf_rcu_read_lock().
> > > > Without explicit bpf_rcu_read_lock() it's still safe and equivalent to existing behavior of bpf_mem_alloc.
> > > > (while your proposed BPF_MA_FREE_AFTER_RCU_GP flavor is not safe to use in hashtab with sleepable progs)
> > > > 
> > > > After that we can unconditionally remove rcu_head/call_rcu from bpf_cpumask and improve usability of bpf_obj_drop.
> > > > Probably usage of bpf_mem_alloc in local storage can be simplified as well.
> > > > Martin wdyt?
> > > 
> > > If the bpf prog always does a bpf_rcu_read_lock() before accessing the
> > > (e.g.) task local storage, it can remove the reuse_now conditions in the
> > > bpf_local_storage and directly call the bpf_mem_cache_free().
> > > 
> > > The only corner use case is when the bpf_prog or syscall does
> > > bpf_task_storage_delete() instead of having the task storage stays with the
> > > whole lifetime of the task_struct. Using REUSE_AFTER_RCU_GP will be a change
> > > of this uaf guarantee to the sleepable program but it is still safe because
> > > it is freed after tasks_trace gp. We could take this chance to align this
> > > behavior of the local storage map to the other bpf maps.
> > > 
> > > For BPF_MA_FREE_AFTER_RCU_GP, there are cases that the bpf local storage
> > > knows it can be freed without waiting tasks_trace gp. However, only
> > > task/cgroup storages are in bpf ma and I don't believe this optimization
> > > matter much for them. I would rather focus on the REUSE_AFTER_RCU_GP first.
> > 
> > I'm confused which REUSE_AFTER_RCU_GP you meant.
> > What I proposed above is REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace
> 
> Regarding REUSE_AFTER_RCU_GP, I meant
> REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace.
> 
> > 
> > Hou's proposals: 1. BPF_MA_REUSE_AFTER_two_RCUs_GP 2. BPF_MA_FREE_AFTER_single_RCU_GP
> 
> It probably is where the confusion is. I thought Hou's
> BPF_MA_REUSE_AFTER_RCU_GP is already
> REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. From the commit message:

Sorry. My bad. You're correct.
The difference between my and Hou's #1 is whether rcu_tasks_trace is global or per-cpu.

> 
> " ... So introduce BPF_MA_REUSE_AFTER_RCU_GP to solve these problems. For
> BPF_MA_REUSE_AFTER_GP, the freed objects are reused only after one RCU
> grace period and may be returned back to slab system after another
> RCU-tasks-trace grace period. ..."
> 
> [I assumed BPF_MA_REUSE_AFTER_GP is just a typo of BPF_MA_REUSE_AFTER_"RCU"_GP]
> 
> > 
> > If I'm reading bpf_local_storage correctly it can remove reuse_now logic
> > in all conditions with REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace.
> 
> Right, for smap->bpf_ma == true (cgroup and task storage), all reuse_now
> logic can be gone and directly use the bpf_mem_cache_free(). Potentially the
> sk/inode can also move to bpf_ma after running some benchmark. This will
> simplify things a lot. For sk storage, the reuse_now was there to avoid the
> unnecessary tasks_trace gp because performance impact was reported on sk
> storage where connections can be open-and-close very frequently.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux