On Wed, May 03, 2023 at 04:39:01PM -0700, Martin KaFai Lau wrote: > On 5/3/23 4:06 PM, Alexei Starovoitov wrote: > > On Wed, May 03, 2023 at 02:57:03PM -0700, Martin KaFai Lau wrote: > > > On 5/3/23 11:48 AM, Alexei Starovoitov wrote: > > > > What it means that sleepable progs using hashmap will be able to avoid uaf with bpf_rcu_read_lock(). > > > > Without explicit bpf_rcu_read_lock() it's still safe and equivalent to existing behavior of bpf_mem_alloc. > > > > (while your proposed BPF_MA_FREE_AFTER_RCU_GP flavor is not safe to use in hashtab with sleepable progs) > > > > > > > > After that we can unconditionally remove rcu_head/call_rcu from bpf_cpumask and improve usability of bpf_obj_drop. > > > > Probably usage of bpf_mem_alloc in local storage can be simplified as well. > > > > Martin wdyt? > > > > > > If the bpf prog always does a bpf_rcu_read_lock() before accessing the > > > (e.g.) task local storage, it can remove the reuse_now conditions in the > > > bpf_local_storage and directly call the bpf_mem_cache_free(). > > > > > > The only corner use case is when the bpf_prog or syscall does > > > bpf_task_storage_delete() instead of having the task storage stays with the > > > whole lifetime of the task_struct. Using REUSE_AFTER_RCU_GP will be a change > > > of this uaf guarantee to the sleepable program but it is still safe because > > > it is freed after tasks_trace gp. We could take this chance to align this > > > behavior of the local storage map to the other bpf maps. > > > > > > For BPF_MA_FREE_AFTER_RCU_GP, there are cases that the bpf local storage > > > knows it can be freed without waiting tasks_trace gp. However, only > > > task/cgroup storages are in bpf ma and I don't believe this optimization > > > matter much for them. I would rather focus on the REUSE_AFTER_RCU_GP first. > > > > I'm confused which REUSE_AFTER_RCU_GP you meant. > > What I proposed above is REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace > > Regarding REUSE_AFTER_RCU_GP, I meant > REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. > > > > > Hou's proposals: 1. BPF_MA_REUSE_AFTER_two_RCUs_GP 2. BPF_MA_FREE_AFTER_single_RCU_GP > > It probably is where the confusion is. I thought Hou's > BPF_MA_REUSE_AFTER_RCU_GP is already > REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. From the commit message: Sorry. My bad. You're correct. The difference between my and Hou's #1 is whether rcu_tasks_trace is global or per-cpu. > > " ... So introduce BPF_MA_REUSE_AFTER_RCU_GP to solve these problems. For > BPF_MA_REUSE_AFTER_GP, the freed objects are reused only after one RCU > grace period and may be returned back to slab system after another > RCU-tasks-trace grace period. ..." > > [I assumed BPF_MA_REUSE_AFTER_GP is just a typo of BPF_MA_REUSE_AFTER_"RCU"_GP] > > > > > If I'm reading bpf_local_storage correctly it can remove reuse_now logic > > in all conditions with REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. > > Right, for smap->bpf_ma == true (cgroup and task storage), all reuse_now > logic can be gone and directly use the bpf_mem_cache_free(). Potentially the > sk/inode can also move to bpf_ma after running some benchmark. This will > simplify things a lot. For sk storage, the reuse_now was there to avoid the > unnecessary tasks_trace gp because performance impact was reported on sk > storage where connections can be open-and-close very frequently.