On Thu, Feb 11, 2021 at 01:25:00AM +0100, Daniel Borkmann wrote: > On 2/10/21 6:56 PM, Denis Salopek wrote: > > On Fri, Jan 29, 2021 at 04:58:12PM +0100, Daniel Borkmann wrote: > > > On 1/27/21 6:12 PM, Denis Salopek wrote: > > > > Extend the existing bpf_map_lookup_and_delete_elem() functionality to > > > > hashtab maps, in addition to stacks and queues. > > > > Create a new hashtab bpf_map_ops function that does lookup and deletion > > > > of the element under the same bucket lock and add the created map_ops to > > > > bpf.h. > > > > Add the appropriate test case to 'maps' selftests. > > > > > > > > Signed-off-by: Denis Salopek <denis.salopek@xxxxxxxxxx> > > > > Cc: Juraj Vijtiuk <juraj.vijtiuk@xxxxxxxxxx> > > > > Cc: Luka Oreskovic <luka.oreskovic@xxxxxxxxxx> > > > > Cc: Luka Perkov <luka.perkov@xxxxxxxxxx> > > > > > > This is already possible in a more generic form, that is, via bpf(2) cmd > > > BPF_MAP_LOOKUP_AND_DELETE_BATCH which is implemented by the different htab > > > map flavors which also take the bucket lock. Did you have a try at that? > > > > > > > --- > > > > include/linux/bpf.h | 1 + > > > > kernel/bpf/hashtab.c | 38 +++++++++++++++++++++++++ > > > > kernel/bpf/syscall.c | 9 ++++++ > > > > tools/testing/selftests/bpf/test_maps.c | 7 +++++ > > > > 4 files changed, 55 insertions(+) > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > index 1aac2af12fed..003c1505f0e3 100644 > > > > --- a/include/linux/bpf.h > > > > +++ b/include/linux/bpf.h > > > > @@ -77,6 +77,7 @@ struct bpf_map_ops { > > > > /* funcs callable from userspace and from eBPF programs */ > > > > void *(*map_lookup_elem)(struct bpf_map *map, void *key); > > > > + int (*map_lookup_and_delete_elem)(struct bpf_map *map, void *key, void *value); > > > > int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags); > > > > int (*map_delete_elem)(struct bpf_map *map, void *key); > > > > int (*map_push_elem)(struct bpf_map *map, void *value, u64 flags); > > > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > > > index c1ac7f964bc9..8d8463e0ea34 100644 > > > > --- a/kernel/bpf/hashtab.c > > > > +++ b/kernel/bpf/hashtab.c > > > > @@ -973,6 +973,43 @@ static int check_flags(struct bpf_htab *htab, struct htab_elem *l_old, > > > > return 0; > > > > } > > > > +/* Called from syscall or from eBPF program */ > > > > +static int htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key, void *value) > > > > +{ > > > > + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); > > > > + struct hlist_nulls_head *head; > > > > + struct bucket *b; > > > > + struct htab_elem *l; > > > > + unsigned long flags; > > > > + u32 hash, key_size; > > > > + int ret; > > > > + > > > > + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); > > > > + > > > > + key_size = map->key_size; > > > > + > > > > + hash = htab_map_hash(key, key_size, htab->hashrnd); > > > > + b = __select_bucket(htab, hash); > > > > + head = &b->head; > > > > + > > > > + ret = htab_lock_bucket(htab, b, hash, &flags); > > > > + if (ret) > > > > + return ret; > > > > + > > > > + l = lookup_elem_raw(head, hash, key, key_size); > > > > + > > > > + if (l) { > > > > + copy_map_value(map, value, l->key + round_up(key_size, 8)); > > > > + hlist_nulls_del_rcu(&l->hash_node); > > > > + free_htab_elem(htab, l); > > > > + } else { > > > > + ret = -ENOENT; > > > > + } > > > > + > > > > + htab_unlock_bucket(htab, b, hash, flags); > > > > + return ret; > > > > +} > > > > + > > > > /* Called from syscall or from eBPF program */ > > > > static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, > > > > u64 map_flags) > > > > @@ -1877,6 +1914,7 @@ const struct bpf_map_ops htab_map_ops = { > > > > .map_free = htab_map_free, > > > > .map_get_next_key = htab_map_get_next_key, > > > > .map_lookup_elem = htab_map_lookup_elem, > > > > + .map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem, > > > > .map_update_elem = htab_map_update_elem, > > > > .map_delete_elem = htab_map_delete_elem, > > > > .map_gen_lookup = htab_map_gen_lookup, > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > > index e5999d86c76e..4ff45c8d1077 100644 > > > > --- a/kernel/bpf/syscall.c > > > > +++ b/kernel/bpf/syscall.c > > > > @@ -1505,6 +1505,15 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) > > > > if (map->map_type == BPF_MAP_TYPE_QUEUE || > > > > map->map_type == BPF_MAP_TYPE_STACK) { > > > > err = map->ops->map_pop_elem(map, value); > > > > + } else if (map->map_type == BPF_MAP_TYPE_HASH) { > > > > + if (!bpf_map_is_dev_bound(map)) { > > > > + bpf_disable_instrumentation(); > > > > + rcu_read_lock(); > > > > + err = map->ops->map_lookup_and_delete_elem(map, key, value); > > > > + rcu_read_unlock(); > > > > + bpf_enable_instrumentation(); > > > > + maybe_wait_bpf_programs(map); > > > > + } > > > > } else { > > > > err = -ENOTSUPP; > > > > } > > > > Hi, > > > > Is there something wrong with adding hashmap functionality to the > > map_lookup_and_delete_elem() function? As per commit > > bd513cd08f10cbe28856f99ae951e86e86803861, adding this functionality for > > other map types was the plan at the time this function was implemented, > > so wouldn't this addition be useful? Otherwise, this function would only > > be used for stack/queue popping. So why does it even exist in the first > > place, if the _batch function can be used instead? > > > > I understand the _batch function can do the same job, but is there a > > reason why using it would be a better option? There is also a > > performance impact in the _batch function when compared to my function - > > not too big, but still worth the mention. In my benchmarks, the _batch > > function takes 15% longer for lookup and deletion of one element. > > Thanks for the benchmark, out of curiosity do you happen to have numbers > at what point it pays off (2/3/more elems)? I don't think anything speaks > against single lookup + delete functionality given the cmd API is already > set in stone and it might be easier to use (though libbpf could also hide > this behind api), but it would be nice if we could ideally reuse the batch > lookup + delete code (e.g. bpf_map_do_batch()) in order to get this > transparently for free for the various other map types as well. > > Thanks, > Daniel Hi Daniel, I made a couple of benchmarks with different batch sizes and noticed that the _batch function performance depends on the size of the map we're working on - the number of elements it "pays off" depends on the max entries of the map. For a map with 10 elements, 100,000 L&Ds of 1 element takes: batch : 0.055s nobatch : 0.050s Same map, 100,000 L&Ds of 3 elements, batch is faster: batch : 0.111s nobatch : 0.138s A map with 1,000 elements, 100,000 L&Ds of 1 element takes: batch : 0.149s nobatch : 0.050s Same map, 100,000 L&Ds of 9 elements, batch gets faster: batch : 0.316s nobatch : 0.418s A map with 10,000 elements, 100,000 L&Ds of 1 element takes: batch : 1.690s nobatch : 0.049s Same map, 100,000 L&Ds of 125 elements, batch gets faster: batch : 5.032s nobatch : 5.041s My benchmark is simple: I load an XDP program with a hashmap, and run some dummy "statistics" program L&D-ing the same map in a loop and updating it with some elements so it's never empty, i.e. the L&D never returns ENOENT. I'm not sure how "realistic" usecase this is, or are there any cache hits/misses influencing the results, so I don't know if this is the best benchmark. Denis