On Thu, Sep 23, 2021 at 11:42:55AM -0700, Andrii Nakryiko wrote: > On Wed, Sep 22, 2021 at 6:28 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > On Wed, Sep 22, 2021 at 04:07:52PM -0700, Andrii Nakryiko wrote: > > > > > Please see my RFC ([0]). I don't think there is much to coordinate. It > > > > > could be purely BPF-side code, or BPF + user-space initialization > > > > > code, depending on the need. It's a simple and beautiful algorithm, > > > > > which BPF is powerful enough to implement customly and easily. > > > > > > > > > > [0] https://lore.kernel.org/bpf/20210922203224.912809-1-andrii@xxxxxxxxxx/T/#t > > > > In practice, the bloom filter will be populated only once by the userspace. > > > > > > > > The future update will be done by map-in-map to replace the whole bloom filter. > > > > May be with more max_entries with more nr_hashes. May be fewer > > > > max_entries with fewer nr_hashes. > > > > > > > > Currently, the continuous running bpf prog using this bloom filter does > > > > not need to worry about any change in the newer bloom filter > > > > configure/setup. > > > > > > > > I wonder how that may look like in the custom bpf bloom filter in the > > > > bench prog for the map-in-map usage. > > > > > > You'd have to use BPF_MAP_TYPE_ARRAY for the map-in-map use case. > > Right, another map is needed. When the user space generates > > a new bloom filter as inner map, it is likely that it has different > > number of entries, so the map size is different. > > > > The old and new inner array map need to at least have the same value_size, > > so an one element array with different value_size will not work. > > > > The inner array map with BPF_F_INNER_MAP can have different max_entries > > but then there is no inline code lookup generation. It may not be too > > bad to call it multiple times to lookup a value considering the > > array_map_lookup_elem will still be directly called without retpoline. > > All true, of course, due to generic BPF limitations. In practice, I'd > decide what's the maximum size of the bloom filter I'd need and use > that as an inner map definition. If I understand correctly, there is > going to be only one "active" Bloom filter map and it's generally not > that big (few megabytes covers tons of "max_entries"), so I'd just > work with maximum expected size. A constant size defeats the configurability/flexibility mem usage argument for the custom bpf map. > > If I absolutely needed variable-sized filters, I'd consider doing a > multi-element array as you suggested, but I'd expect lower > performance, as you mentioned. yep. I also expect it will be worse and it reduces the benefit of using bloom filter. > > The next part is how to learn those "const volatile __u32 bloom_*;" > > values of the new inner map. I think the max_entires can be obtained > > by map_ptr->max_entries. Other vars (e.g. hash_cnt and seed) can > > be used as non-const global, allow the update, and a brief moment of > > inconsistence may be fine. > > For single-element array with fixed value_size I'd put those in first 8 bytes: > > struct my_bloom { > __u32 msk; > __u32 seed; > __u64 data[]; > } > > For multi-element BPF_MAP_TYPE_ARRAY I'd put a mask and seed into elem[0]. Yes, it is the thing that came to my mind also but I was very hesitant even to mention this one extra thing to remind the user that he/she needs to do to avoid potential false negative during the map switching and the user expects it will not happen for bloom filter once the map is fully populated. > I'd expect that hash_cnt would be just hard-coded, because as I > mentioned before, it determines the probability of false positive, > which is what end-users probably care about the most and set upfront, > at least they should be coming at this from the perspective "1% of > false positives is acceptable" rather than "hmm... 3 hash functions is > probably acceptable", no? But if not, first two elements would be > taken. I am not sure a constant hashes/false-positive-rate is always the use case also. but yes, another element in the arraymap will be needed. > > It all sounds doable but all these small need-to-pay-attention > > things add up. > > Of course, there is always a tension between "make it simple and > provide a dedicated BPF helper/BPF map" and "let users implement it on > their own". I'm saying I'm not convinced that it has to be the former > in this case. Bloom filter is just a glorified bit set, once you have > a hashing helper. I don't think we've added BPF_MAP_TYPE_BITSET yet, > though it probably would be pretty useful in some cases, just like the > Bloom filter. Similarly, we don't have BPF_MAP_TYPE_HASHSET in > addition to BPF_MAP_TYPE_HASHMAP. I've seen many cases where HASHMAP > is used as HASHSET, yet we didn't have a dedicated map for that > either. I'm just curious where we draw the line between what should be > added to the kernel for BPF, if there are reasonable ways to avoid > that. I think it is pretty clear on the ups and downs on both arguments in terms of performance and configurability and map-in-map. How to move it forward from here? Is it a must to keep the bloomfilter-like map in pure bpf only and we should stop the current work? or it is useful to have the kernel bloom filter that provides a get-and-go usage and a consistent experience for user in map-in-map which is the primary use case here. I don't think this is necessarily blocking the custom bpf map effort also.