On Tue, 9 Aug 2022 at 05:18, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, Aug 8, 2022 at 5:25 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > > > > > Thinking again. I guess the following scenario is possible: > > > > > > rcu_read_lock() > > > v = bpf_map_lookup_elem(&key); > > > t1 = v->field; > > > bpf_map_delete_elem(&key); > > > /* another bpf program triggering bpf_map_update_elem() */ > > > /* the element with 'key' is reused */ > > > /* the value is updated */ > > > t2 = v->field; > > > ... > > > rcu_read_unlock() > > > > > > it is possible t1 and t2 may not be the same. > > > This should be an extremely corner case, not sure how to resolve > > > this easily without performance degradation. > > > > Yes, this is totally possible. This extreme corner case may also > > become a real problem in sleepable programs ;-). > > > > I had an idea in mind on how it can be done completely in the BPF > > program side without changing the map implementation. > > +1 it's best to address it inside the program instead > of complicating and likely slowing down maps. > > As far as sleepable progs.. the issue is not present > because sleepable progs are only allowed to use preallocated maps. The problem being discussed in this thread is exactly with preallocated maps, so I do think the issue is present with sleepable progs. I.e. it can do a lookup, and go to sleep (implicitly or explicitly), and then find the element has already been reused (i.e. the window is wider so it is more likely). It's not a smart thing to do ofcourse, but my point was that compared to the non-sleepable case it won't be a corner case anymore that you might not hit very often. > In non-prealloc maps free_htab_elem() would just call_rcu() > and the element would be freed into the global slab which > will be bad if bpf prog is doing something like above. > > Doing call_rcu_tasks_trace() + call_rcu() would allow > non-prealloc in sleepable, but it doesn't scale > and non-prealloc is already many times slower comparing > to prealloc due to atomic_inc/dec and call_rcu. Exactly, which means the element will be readily reused in the sleepable case since it only uses preallocated maps, which takes us back to the problem above. > > With new bpf_mem_alloc I'm experimenting with batching > of call_rcu_tasks_trace + call_rcu, so hash map can be > dynamically allocated, just as fast as full prealloc and > finally safe in both sleepable and traditional progs. > I was thinking of adding a new SLAB_TYPESAFE_BY_RCU_TASKS_TRACE > flag to the slab, but decided to go that route only if > batching of call_rcu*() won't be sufficient. Interesting stuff, looking forward to it! > Sorry it takes so long to get the patches ready. > Summer in WA is the best time to take vacation :) > I've been taking plenty. Well, I think this thread was just me and Yonghong contemplating other possible fixes to the original bug, so there's no rush, enjoy yourself! :).