On Wed, Sep 22, 2021 at 12:38 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > On Wed, Sep 22, 2021 at 12:06:02PM -0700, Joanne Koong wrote: > > > > On 9/21/21 4:44 PM, Andrii Nakryiko wrote: > > > On Tue, Sep 21, 2021 at 2:30 PM Joanne Koong <joannekoong@xxxxxx> wrote: > > > > Bloom filters are a space-efficient probabilistic data structure > > > > used to quickly test whether an element exists in a set. > > > > In a bloom filter, false positives are possible whereas false > > > > negatives should never be. > > > > > > > > This patch adds a bloom filter map for bpf programs. > > > > The bloom filter map supports peek (determining whether an element > > > > is present in the map) and push (adding an element to the map) > > > > operations.These operations are exposed to userspace applications > > > > through the already existing syscalls in the following way: > > > > > > > > BPF_MAP_LOOKUP_ELEM -> peek > > > > BPF_MAP_UPDATE_ELEM -> push > > > > > > > > The bloom filter map does not have keys, only values. In light of > > > > this, the bloom filter map's API matches that of queue stack maps: > > > > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM > > > > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, > > > > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem > > > > APIs to query or add an element to the bloom filter map. When the > > > > bloom filter map is created, it must be created with a key_size of 0. > > > > > > > > For updates, the user will pass in the element to add to the map > > > > as the value, with a NULL key. For lookups, the user will pass in the > > > > element to query in the map as the value. In the verifier layer, this > > > > requires us to modify the argument type of a bloom filter's > > > > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in > > > > the syscall layer, we need to copy over the user value so that in > > > > bpf_map_peek_elem, we know which specific value to query. > > > > > > > > A few things to please take note of: > > > > * If there are any concurrent lookups + updates, the user is > > > > responsible for synchronizing this to ensure no false negative lookups > > > > occur. > > > > * The number of hashes to use for the bloom filter is configurable from > > > > userspace. If no number is specified, the default used will be 5 hash > > > > functions. The benchmarks later in this patchset can help compare the > > > > performance of using different number of hashes on different entry > > > > sizes. In general, using more hashes decreases the speed of a lookup, > > > > but increases the false positive rate of an element being detected in the > > > > bloom filter. > > > > * Deleting an element in the bloom filter map is not supported. > > > > * The bloom filter map may be used as an inner map. > > > > * The "max_entries" size that is specified at map creation time is used to > > > > approximate a reasonable bitmap size for the bloom filter, and is not > > > > otherwise strictly enforced. If the user wishes to insert more entries into > > > > the bloom filter than "max_entries", they may do so but they should be > > > > aware that this may lead to a higher false positive rate. > > > > > > > > Signed-off-by: Joanne Koong <joannekoong@xxxxxx> > > > > --- > > > > include/linux/bpf_types.h | 1 + > > > > include/uapi/linux/bpf.h | 1 + > > > > kernel/bpf/Makefile | 2 +- > > > > kernel/bpf/bloom_filter.c | 185 +++++++++++++++++++++++++++++++++ > > > > kernel/bpf/syscall.c | 14 ++- > > > > kernel/bpf/verifier.c | 19 +++- > > > > tools/include/uapi/linux/bpf.h | 1 + > > > > 7 files changed, 217 insertions(+), 6 deletions(-) > > > > create mode 100644 kernel/bpf/bloom_filter.c > > > > > > > See some stylistic nitpicking below (and not a nitpicking about BTF). > > > > > > But I just wanted to say that I'm a bit amazed by how much special > > > casing this BLOOM_FILTER map requires in syscall.c and verifier.c. I > > > still believe that starting with a BPF helper for hashing would be a > > > better approach, but oh well. > > > > > > [...] > > I liked your comment on v1 regarding using a BPF helper and I agree with the > > benefits you outlined. I'm curious to see what the performance differences between > > that approach and this one end up being, if any. I plan to test out the BPF helper > > approach in a few weeks, and if the performance is comparable or better, I am definitely open to > > reverting this code and just going with the BPF helper approach :) > Reverting won't be an option and I don't think it is necessary. > > Agree that a generic hash helper is in general useful. It may be > useful in hashing the skb also. The bpf prog only implementation could > have more flexibility in configuring roundup to pow2 or not, how to hash, > how many hashes, nr of bits ...etc. In the mean time, the bpf prog and Exactly. If I know better how many bits I need, I'll have to reverse engineer kernel's heuristic to provide such max_entries values to arrive at the desired amount of memory that Bloom filter will be using. > user space need to co-ordinate more and worry about more things, > e.g. how to reuse a bloom filter with different nr_hashes, > nr_bits, handle synchronization...etc. Please see my RFC ([0]). I don't think there is much to coordinate. It could be purely BPF-side code, or BPF + user-space initialization code, depending on the need. It's a simple and beautiful algorithm, which BPF is powerful enough to implement customly and easily. [0] https://lore.kernel.org/bpf/20210922203224.912809-1-andrii@xxxxxxxxxx/T/#t > > It is useful to have a default implementation in the kernel > for some useful maps like this one that works for most > common cases and the bpf user can just use it as get-and-go > like all other common bpf maps do. I disagree with the premise that Bloom filter is a common and generally useful data structure, tbh. It has its nice niche applications, but its semantics isn't applicable generally, which is why I hesitate to claim that this should live in kernel. > > imo, the verifier/syscall change here is quite minimal also and it > is mostly riding on top of the existing BPF_MAP_TYPE_STACK. Not exactly true, it adds quite a bit of new custom pieces. But it's subjective so there is no point in arguing.