Andrii Nakryiko wrote: > On Wed, Sep 1, 2021 at 8:18 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > > > Joanne Koong wrote: > > > Bloom filters are a space-efficient probabilistic data structure > > > used to quickly test whether an element exists in a set. > > > In a bloom filter, false positives are possible whereas false > > > negatives are not. > > > > > > This patch adds a bloom filter map for bpf programs. > > > The bloom filter map supports peek (determining whether an element > > > is present in the map) and push (adding an element to the map) > > > operations.These operations are exposed to userspace applications > > > through the already existing syscalls in the following way: > > > > > > BPF_MAP_LOOKUP_ELEM -> peek > > > BPF_MAP_UPDATE_ELEM -> push > > > > > > The bloom filter map does not have keys, only values. In light of > > > this, the bloom filter map's API matches that of queue stack maps: > > > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM > > > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, > > > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem > > > APIs to query or add an element to the bloom filter map. When the > > > bloom filter map is created, it must be created with a key_size of 0. > > > > > > For updates, the user will pass in the element to add to the map > > > as the value, wih a NULL key. For lookups, the user will pass in the > > > element to query in the map as the value. In the verifier layer, this > > > requires us to modify the argument type of a bloom filter's > > > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in > > > the syscall layer, we need to copy over the user value so that in > > > bpf_map_peek_elem, we know which specific value to query. > > > > > > The maximum number of entries in the bloom filter is not enforced; if > > > the user wishes to insert more entries into the bloom filter than they > > > specified as the max entries size of the bloom filter, that is permitted > > > but the performance of their bloom filter will have a higher false > > > positive rate. > > > > hmm I'm wondering if this means the memory footprint can grow without > > bounds? Typically maps have an upper bound on memory established at > > alloc time. > > It's a bit unfortunate wording, but no, the amount of used memory is > fixed. Bloom filter is a probabilistic data structure in which each > "value" has few designated bits, determined by hash functions on that > value. The number of bits is fixed, though. If all designated bits are > set to 1, then we declare "value" to be present in the Bloom filter. Thanks actually reading the code helped as well. Looks like a v2 will likely happen perhaps a small note here for maxiumum number of entries in the bloom filter is only used to estimate the number of bits used. I guess if a BPF user did want to enforce the max number of entries a simple BPF counter wouldn't be much for users to add. I usually add these to maps for debug/statistic reasons anyways. > If at least one is 0, then we definitely didn't see "value" yet > (that's what guarantees no false negatives; this also answers Alexei's > worry about possible false negative due to unsynchronized update and > lookup, it can't happen by the nature of the data structure design, > regardless of synchronization). We can, of course, have all such bits > set to 1 even if the actual value was never "added" into the Bloom > filter, just by the nature of hash collisions with other elements' > hash functions (that's where the false positive comes from). It might > be useful to just leave a link to Wikipedia for description of Bloom > filter data structure ([0]). > > [0] https://en.wikipedia.org/wiki/Bloom_filter Thanks. Yep needed a refresher for sure.