On Mon, Aug 8, 2022 at 5:25 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > > > Thinking again. I guess the following scenario is possible: > > > > rcu_read_lock() > > v = bpf_map_lookup_elem(&key); > > t1 = v->field; > > bpf_map_delete_elem(&key); > > /* another bpf program triggering bpf_map_update_elem() */ > > /* the element with 'key' is reused */ > > /* the value is updated */ > > t2 = v->field; > > ... > > rcu_read_unlock() > > > > it is possible t1 and t2 may not be the same. > > This should be an extremely corner case, not sure how to resolve > > this easily without performance degradation. > > Yes, this is totally possible. This extreme corner case may also > become a real problem in sleepable programs ;-). > > I had an idea in mind on how it can be done completely in the BPF > program side without changing the map implementation. +1 it's best to address it inside the program instead of complicating and likely slowing down maps. As far as sleepable progs.. the issue is not present because sleepable progs are only allowed to use preallocated maps. In non-prealloc maps free_htab_elem() would just call_rcu() and the element would be freed into the global slab which will be bad if bpf prog is doing something like above. Doing call_rcu_tasks_trace() + call_rcu() would allow non-prealloc in sleepable, but it doesn't scale and non-prealloc is already many times slower comparing to prealloc due to atomic_inc/dec and call_rcu. With new bpf_mem_alloc I'm experimenting with batching of call_rcu_tasks_trace + call_rcu, so hash map can be dynamically allocated, just as fast as full prealloc and finally safe in both sleepable and traditional progs. I was thinking of adding a new SLAB_TYPESAFE_BY_RCU_TASKS_TRACE flag to the slab, but decided to go that route only if batching of call_rcu*() won't be sufficient. Sorry it takes so long to get the patches ready. Summer in WA is the best time to take vacation :) I've been taking plenty.