On Tue, Sep 21, 2021 at 2:30 PM Joanne Koong <joannekoong@xxxxxx> wrote: > > Bloom filters are a space-efficient probabilistic data structure > used to quickly test whether an element exists in a set. > In a bloom filter, false positives are possible whereas false > negatives should never be. > > This patch adds a bloom filter map for bpf programs. > The bloom filter map supports peek (determining whether an element > is present in the map) and push (adding an element to the map) > operations.These operations are exposed to userspace applications > through the already existing syscalls in the following way: > > BPF_MAP_LOOKUP_ELEM -> peek > BPF_MAP_UPDATE_ELEM -> push > > The bloom filter map does not have keys, only values. In light of > this, the bloom filter map's API matches that of queue stack maps: > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem > APIs to query or add an element to the bloom filter map. When the > bloom filter map is created, it must be created with a key_size of 0. > > For updates, the user will pass in the element to add to the map > as the value, with a NULL key. For lookups, the user will pass in the > element to query in the map as the value. In the verifier layer, this > requires us to modify the argument type of a bloom filter's > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in > the syscall layer, we need to copy over the user value so that in > bpf_map_peek_elem, we know which specific value to query. > > A few things to please take note of: > * If there are any concurrent lookups + updates, the user is > responsible for synchronizing this to ensure no false negative lookups > occur. > * The number of hashes to use for the bloom filter is configurable from > userspace. If no number is specified, the default used will be 5 hash > functions. The benchmarks later in this patchset can help compare the > performance of using different number of hashes on different entry > sizes. In general, using more hashes decreases the speed of a lookup, > but increases the false positive rate of an element being detected in the > bloom filter. > * Deleting an element in the bloom filter map is not supported. > * The bloom filter map may be used as an inner map. > * The "max_entries" size that is specified at map creation time is used to > approximate a reasonable bitmap size for the bloom filter, and is not > otherwise strictly enforced. If the user wishes to insert more entries into > the bloom filter than "max_entries", they may do so but they should be > aware that this may lead to a higher false positive rate. > > Signed-off-by: Joanne Koong <joannekoong@xxxxxx> > --- > include/linux/bpf_types.h | 1 + > include/uapi/linux/bpf.h | 1 + > kernel/bpf/Makefile | 2 +- > kernel/bpf/bloom_filter.c | 185 +++++++++++++++++++++++++++++++++ > kernel/bpf/syscall.c | 14 ++- > kernel/bpf/verifier.c | 19 +++- > tools/include/uapi/linux/bpf.h | 1 + > 7 files changed, 217 insertions(+), 6 deletions(-) > create mode 100644 kernel/bpf/bloom_filter.c > See some stylistic nitpicking below (and not a nitpicking about BTF). But I just wanted to say that I'm a bit amazed by how much special casing this BLOOM_FILTER map requires in syscall.c and verifier.c. I still believe that starting with a BPF helper for hashing would be a better approach, but oh well. [...] > + > +static inline u32 hash(struct bpf_bloom_filter *bloom_filter, void *value, > + u64 value_size, u32 index) > +{ > + if (bloom_filter->aligned_u32_count) > + return jhash2(value, bloom_filter->aligned_u32_count, > + bloom_filter->hash_seed + index) & > + bloom_filter->bit_array_mask; > + > + return jhash(value, value_size, bloom_filter->hash_seed + index) & > + bloom_filter->bit_array_mask; stylistic nit, but this feels way to dense text-wise, this seems easier to follow u32 h; if (bloom_filter->aligned_u32_count) h = jhash2(...); else h = jhash(...); return h & bloom_filter->bit_array_mask; WDYT? > +} > + > +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value) > +{ > + struct bpf_bloom_filter *bloom_filter = > + container_of(map, struct bpf_bloom_filter, map); > + u32 i; > + > + for (i = 0; i < bloom_filter->nr_hash_funcs; i++) { > + if (!test_bit(hash(bloom_filter, value, map->value_size, i), > + bloom_filter->bit_array)) > + return -ENOENT; same here, I think the hash calculation deserves a separate statement and a local variable > + } > + > + return 0; > +} > + [...] > +static void bloom_filter_map_free(struct bpf_map *map) > +{ > + struct bpf_bloom_filter *bloom_filter = > + container_of(map, struct bpf_bloom_filter, map); > + > + bpf_map_area_free(bloom_filter); > +} > + > +static int bloom_filter_map_push_elem(struct bpf_map *map, void *value, > + u64 flags) > +{ > + struct bpf_bloom_filter *bloom_filter = > + container_of(map, struct bpf_bloom_filter, map); > + u32 i; > + > + if (flags != BPF_ANY) > + return -EINVAL; > + > + for (i = 0; i < bloom_filter->nr_hash_funcs; i++) > + set_bit(hash(bloom_filter, value, map->value_size, i), > + bloom_filter->bit_array); same as above about hash() call on separate line > + > + return 0; > +} > + > +static void *bloom_filter_map_lookup_elem(struct bpf_map *map, void *key) > +{ > + /* The eBPF program should use map_peek_elem instead */ > + return ERR_PTR(-EINVAL); > +} > + > +static int bloom_filter_map_update_elem(struct bpf_map *map, void *key, > + void *value, u64 flags) > +{ > + /* The eBPF program should use map_push_elem instead */ > + return -EINVAL; > +} > + > +static int bloom_filter_map_delete_elem(struct bpf_map *map, void *key) > +{ > + return -EOPNOTSUPP; > +} > + > +static int bloom_filter_map_get_next_key(struct bpf_map *map, void *key, > + void *next_key) > +{ > + return -EOPNOTSUPP; > +} > + > +static int bloom_filter_map_btf_id; > +const struct bpf_map_ops bloom_filter_map_ops = { > + .map_meta_equal = bpf_map_meta_equal, > + .map_alloc = bloom_filter_map_alloc, > + .map_free = bloom_filter_map_free, > + .map_push_elem = bloom_filter_map_push_elem, > + .map_peek_elem = bloom_filter_map_peek_elem, > + .map_lookup_elem = bloom_filter_map_lookup_elem, > + .map_update_elem = bloom_filter_map_update_elem, > + .map_delete_elem = bloom_filter_map_delete_elem, > + .map_get_next_key = bloom_filter_map_get_next_key, > + .map_check_btf = map_check_no_btf, can you please implement basically a no-op callback here to allow specifying btf_value_id, there is no good reason to restrict this new map to not allow BTF type being specified for its value > + .map_btf_name = "bpf_bloom_filter", > + .map_btf_id = &bloom_filter_map_btf_id, > +}; [...]