On Thu, Feb 16, 2023 at 5:19 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 2/17/2023 12:35 AM, Alexei Starovoitov wrote: > > On Thu, Feb 16, 2023 at 5:55 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> Beside BPF_REUSE_AFTER_RCU_GP, is BPF_FREE_AFTER_RCU_GP a feasible solution ? > > The idea is for bpf_mem_free to wait normal RCU GP before adding > > the elements back to the free list and free the elem to global kernel memory > > only after both rcu and rcu_tasks_trace GPs as it's doing now. > > > >> Its downside is that it will enforce sleep-able program to use > >> bpf_rcu_read_{lock,unlock}() to access these returned pointers ? > > sleepable can access elems without kptrs/spin_locks > > even when not using rcu_read_lock, since it's safe, but there is uaf. > > Some progs might be fine with it. > > When sleepable needs to avoid uaf they will use bpf_rcu_read_lock. > Thanks for the explanation for BPF_REUSE_AFTER_RCU_GP. It seems that > BPF_REUSE_AFTER_RCU_GP may incur OOM easily, because before the expiration of > one RCU GP, these freed elements will not available to both bpf ma or slab > subsystem and after the expiration of RCU GP, these freed elements are only > available for one bpf ma but the number of these freed elements maybe too many > for one bpf ma, so part of these freed elements will be freed through > call_rcu_tasks_trace() and these freed-again elements will not be available for > slab subsystem untill the expiration of tasks trace RCU. In brief, after one RCU > GP, part of these freed elements will be reused, but the majority of these > elements will still be freed through call_rcu_tasks_trace(). Due to the doubt > above, I proposed BPF_FREE_AFTER_RCU to directly free these elements after one > RCU GP and enforce sleepable program to use bpf_rcu_read_lock() to access these > elements, but the enforcement will break the existing sleepable programs, so > BPF_FREE_AFTER_GP is still not a good idea. I will check whether or not these is > still OOM risk for BPF_REUSE_AFTER_RCU_GP and try to mitigate if it is possible > (e.g., share these freed elements between all bpf ma instead of one bpf ma which > free it). Since BPF_REUSE_AFTER_RCU_GP is a new thing that will be used by qptrie map and maybe? local storage, there is no sleepable breakage. If we start using BPF_REUSE_AFTER_RCU_GP for hashmaps with kptrs and enforce bpf_rcu_read_lock() this is also ok, since kptrs are unstable. I prefer to avoid complicating bpf ma with sharing free lists across all ma-s. Unless this is really trivial code that is easy to review.