On Fri, Aug 30, 2019 at 06:39:48AM +0000, Yonghong Song wrote: > > > > The problem happens when you are trying to do batch lookup on a > > hashmap and when executing bpf_map_get_next_key(map, key, next_key) > > the key is removed, then that call will return the first key and you'd > > start iterating the map from the beginning again and retrieve > > duplicate information. > > Right. Maybe we can have another bpf_map_ops callback function > like 'map_batch_get_next_key' which won't fall back to the > first key if the 'key' is not available in the hash table? The reason I picked this get_next_key behavior long ago because I couldn't come up with a way to pick the next key consistently. In the hash table the elements are not sorted. If there are more than one element in the hash table bucket they are added to the link list in sort-of random order. If one out of N elems in the bucket are deleted which one should be picked next? select_bucket() picks the bucket. if lookup_nulls_elem_raw() cannot find the element which one in the link list is the "right one" to continue? Iterating over hash table without duplicates when elements are being added and removed in parallel is a hard problem to solve. I think "best effort" is the right answer. When users care about consistency they should use map-in-map.