Hi Toke, On 2/6/2025 11:05 PM, Toke Høiland-Jørgensen wrote: > Hou Tao <houtao@xxxxxxxxxxxxxxx> writes: > >> +cc Cody Haas >> >> Sorry for the resend. I sent the reply in the HTML format. >> >> On 2/4/2025 4:28 PM, Hou Tao wrote: >>> Currently, the update of existing element in hash map involves two >>> steps: >>> 1) insert the new element at the head of the hash list >>> 2) remove the old element >>> >>> It is possible that the concurrent lookup operation may fail to find >>> either the old element or the new element if the lookup operation starts >>> before the addition and continues after the removal. >>> >>> Therefore, replacing the two-step update with an atomic update. After >>> the change, the update will be atomic in the perspective of the lookup >>> operation: it will either find the old element or the new element. >>> >>> Signed-off-by: Hou Tao <hotforest@xxxxxxxxx> >>> --- >>> kernel/bpf/hashtab.c | 14 ++++++++------ >>> 1 file changed, 8 insertions(+), 6 deletions(-) >>> >>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>> index 4a9eeb7aef85..a28b11ce74c6 100644 >>> --- a/kernel/bpf/hashtab.c >>> +++ b/kernel/bpf/hashtab.c >>> @@ -1179,12 +1179,14 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, >>> goto err; >>> } >>> >>> - /* add new element to the head of the list, so that >>> - * concurrent search will find it before old elem >>> - */ >>> - hlist_nulls_add_head_rcu(&l_new->hash_node, head); >>> - if (l_old) { >>> - hlist_nulls_del_rcu(&l_old->hash_node); >>> + if (!l_old) { >>> + hlist_nulls_add_head_rcu(&l_new->hash_node, head); >>> + } else { >>> + /* Replace the old element atomically, so that >>> + * concurrent search will find either the new element or >>> + * the old element. >>> + */ >>> + hlist_nulls_replace_rcu(&l_new->hash_node, &l_old->hash_node); >>> >>> /* l_old has already been stashed in htab->extra_elems, free >>> * its special fields before it is available for reuse. Also >>> >> After thinking about it the second time, the atomic list replacement on >> the update side is enough to make lookup operation always find the >> existing element. However, due to the immediate reuse, the lookup may >> find an unexpected value. Maybe we should disable the immediate reuse >> for specific map (e.g., htab of maps). > Hmm, in an RCU-protected data structure, reusing the memory before an > RCU grace period has elapsed is just as wrong as freeing it, isn't it? > I.e., the reuse logic should have some kind of call_rcu redirection to > be completely correct? Not for all cases. There is SLAB_TYPESAFE_BY_RCU-typed slab. For hash map, the reuse is also tricky (e.g., the goto again case in lookup_nulls_elem_raw), however it can not prevent the lookup procedure from returning unexpected value. I had post a patch set [1] to "fix" that, but Alexei said it is "a known quirk". Here I am not sure about whether it is reasonable to disable the reuse for htab of maps only. I will post a v2 for the patch set. [1]: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@xxxxxxxxxxxxxxx/ > -Toke