Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation

Martin KaFai Lau <kafai@xxxxxx> · Thu, 23 Sep 2021 12:42:33 -0700

On Thu, Sep 23, 2021 at 11:42:55AM -0700, Andrii Nakryiko wrote:
> On Wed, Sep 22, 2021 at 6:28 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
> >
> > On Wed, Sep 22, 2021 at 04:07:52PM -0700, Andrii Nakryiko wrote:
> > > > > Please see my RFC ([0]). I don't think there is much to coordinate. It
> > > > > could be purely BPF-side code, or BPF + user-space initialization
> > > > > code, depending on the need. It's a simple and beautiful algorithm,
> > > > > which BPF is powerful enough to implement customly and easily.
> > > > >
> > > > >   [0] https://lore.kernel.org/bpf/20210922203224.912809-1-andrii@xxxxxxxxxx/T/#t
> > > > In practice, the bloom filter will be populated only once by the userspace.
> > > >
> > > > The future update will be done by map-in-map to replace the whole bloom filter.
> > > > May be with more max_entries with more nr_hashes.  May be fewer
> > > > max_entries with fewer nr_hashes.
> > > >
> > > > Currently, the continuous running bpf prog using this bloom filter does
> > > > not need to worry about any change in the newer bloom filter
> > > > configure/setup.
> > > >
> > > > I wonder how that may look like in the custom bpf bloom filter in the
> > > > bench prog for the map-in-map usage.
> > >
> > > You'd have to use BPF_MAP_TYPE_ARRAY for the map-in-map use case.
> > Right, another map is needed.  When the user space generates
> > a new bloom filter as inner map, it is likely that it has different
> > number of entries, so the map size is different.
> >
> > The old and new inner array map need to at least have the same value_size,
> > so an one element array with different value_size will not work.
> >
> > The inner array map with BPF_F_INNER_MAP can have different max_entries
> > but then there is no inline code lookup generation.  It may not be too
> > bad to call it multiple times to lookup a value considering the
> > array_map_lookup_elem will still be directly called without retpoline.
> 
> All true, of course, due to generic BPF limitations. In practice, I'd
> decide what's the maximum size of the bloom filter I'd need and use
> that as an inner map definition. If I understand correctly, there is
> going to be only one "active" Bloom filter map and it's generally not
> that big (few megabytes covers tons of "max_entries"), so I'd just
> work with maximum expected size.
A constant size defeats the configurability/flexibility mem usage
argument for the custom bpf map.

> 
> If I absolutely needed variable-sized filters, I'd consider doing a
> multi-element array as you suggested, but I'd expect lower
> performance, as you mentioned.
yep.  I also expect it will be worse and it reduces the benefit of
using bloom filter.

> > The next part is how to learn those "const volatile __u32 bloom_*;"
> > values of the new inner map.  I think the max_entires can be obtained
> > by map_ptr->max_entries.   Other vars (e.g. hash_cnt and seed) can
> > be used as non-const global, allow the update, and a brief moment of
> > inconsistence may be fine.
> 
> For single-element array with fixed value_size I'd put those in first 8 bytes:
> 
> struct my_bloom {
>     __u32 msk;
>     __u32 seed;
>     __u64 data[];
> }
> 
> For multi-element BPF_MAP_TYPE_ARRAY I'd put a mask and seed into elem[0].
Yes, it is the thing that came to my mind also but I was very hesitant even
to mention this one extra thing to remind the user that he/she needs to do
to avoid potential false negative during the map switching and the user
expects it will not happen for bloom filter once the map is fully
populated.

> I'd expect that hash_cnt would be just hard-coded, because as I
> mentioned before, it determines the probability of false positive,
> which is what end-users probably care about the most and set upfront,
> at least they should be coming at this from the perspective "1% of
> false positives is acceptable" rather than "hmm... 3 hash functions is
> probably acceptable", no? But if not, first two elements would be
> taken.
I am not sure a constant hashes/false-positive-rate is always the use
case also. but yes, another element in the arraymap will be needed.

> > It all sounds doable but all these small need-to-pay-attention
> > things add up.
> 
> Of course, there is always a tension between "make it simple and
> provide a dedicated BPF helper/BPF map" and "let users implement it on
> their own". I'm saying I'm not convinced that it has to be the former
> in this case. Bloom filter is just a glorified bit set, once you have
> a hashing helper. I don't think we've added BPF_MAP_TYPE_BITSET yet,
> though it probably would be pretty useful in some cases, just like the
> Bloom filter. Similarly, we don't have BPF_MAP_TYPE_HASHSET in
> addition to BPF_MAP_TYPE_HASHMAP. I've seen many cases where HASHMAP
> is used as HASHSET, yet we didn't have a dedicated map for that
> either. I'm just curious where we draw the line between what should be
> added to the kernel for BPF, if there are reasonable ways to avoid
> that.
I think it is pretty clear on the ups and downs on both arguments
in terms of performance and configurability and map-in-map.

How to move it forward from here?  Is it a must to keep the
bloomfilter-like map in pure bpf only and we should stop
the current work?

or it is useful to have the kernel bloom filter that provides
a get-and-go usage and a consistent experience for user in
map-in-map which is the primary use case here.  I don't think
this is necessarily blocking the custom bpf map effort also.