Hi, On 5/4/2023 7:39 AM, Martin KaFai Lau wrote: > On 5/3/23 4:06 PM, Alexei Starovoitov wrote: >> On Wed, May 03, 2023 at 02:57:03PM -0700, Martin KaFai Lau wrote: >>> On 5/3/23 11:48 AM, Alexei Starovoitov wrote: SNIP >>> >>> If the bpf prog always does a bpf_rcu_read_lock() before accessing the >>> (e.g.) task local storage, it can remove the reuse_now conditions in >>> the >>> bpf_local_storage and directly call the bpf_mem_cache_free(). >>> >>> The only corner use case is when the bpf_prog or syscall does >>> bpf_task_storage_delete() instead of having the task storage stays >>> with the >>> whole lifetime of the task_struct. Using REUSE_AFTER_RCU_GP will be >>> a change >>> of this uaf guarantee to the sleepable program but it is still safe >>> because >>> it is freed after tasks_trace gp. We could take this chance to align >>> this >>> behavior of the local storage map to the other bpf maps. >>> >>> For BPF_MA_FREE_AFTER_RCU_GP, there are cases that the bpf local >>> storage >>> knows it can be freed without waiting tasks_trace gp. However, only >>> task/cgroup storages are in bpf ma and I don't believe this >>> optimization >>> matter much for them. I would rather focus on the REUSE_AFTER_RCU_GP >>> first. OK. >> >> I'm confused which REUSE_AFTER_RCU_GP you meant. >> What I proposed above is >> REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace > > Regarding REUSE_AFTER_RCU_GP, I meant > REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. > >> >> Hou's proposals: 1. BPF_MA_REUSE_AFTER_two_RCUs_GP 2. >> BPF_MA_FREE_AFTER_single_RCU_GP > > It probably is where the confusion is. I thought Hou's > BPF_MA_REUSE_AFTER_RCU_GP is already > REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. From the commit > message: > > " ... So introduce BPF_MA_REUSE_AFTER_RCU_GP to solve these problems. For > BPF_MA_REUSE_AFTER_GP, the freed objects are reused only after one RCU > grace period and may be returned back to slab system after another > RCU-tasks-trace grace period. ..." > > [I assumed BPF_MA_REUSE_AFTER_GP is just a typo of > BPF_MA_REUSE_AFTER_"RCU"_GP] Yes. Now the implementation of BPF_MA_REUSE_AFTER_RCU_GP is already being REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. It moves the free objects to reuse_ready_head list after one RCU GP, splices the elements in reuse_ready_head to wait_for_free when reuse_ready_head is not empty and frees these elements in wait_for_free by call_rcu_tasks_trace(). > >> >> If I'm reading bpf_local_storage correctly it can remove reuse_now logic >> in all conditions with >> REUSE_AFTER_rcu_GP_and_free_after_rcu_tasks_trace. > > Right, for smap->bpf_ma == true (cgroup and task storage), all > reuse_now logic can be gone and directly use the bpf_mem_cache_free(). > Potentially the sk/inode can also move to bpf_ma after running some > benchmark. This will simplify things a lot. For sk storage, the > reuse_now was there to avoid the unnecessary tasks_trace gp because > performance impact was reported on sk storage where connections can be > open-and-close very frequently.