On Wed, Oct 27, 2021 at 12:14 PM Joanne Koong <joannekoong@xxxxxx> wrote: > > On 10/26/21 8:18 PM, Andrii Nakryiko wrote: > > > > > On 10/22/21 3:02 PM, Joanne Koong wrote: > >> This patch adds the kernel-side changes for the implementation of > >> a bpf bloom filter map. > >> > >> The bloom filter map supports peek (determining whether an element > >> is present in the map) and push (adding an element to the map) > >> operations.These operations are exposed to userspace applications > >> through the already existing syscalls in the following way: > >> > >> BPF_MAP_LOOKUP_ELEM -> peek > >> BPF_MAP_UPDATE_ELEM -> push > >> > >> The bloom filter map does not have keys, only values. In light of > >> this, the bloom filter map's API matches that of queue stack maps: > >> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM > >> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, > >> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem > >> APIs to query or add an element to the bloom filter map. When the > >> bloom filter map is created, it must be created with a key_size of 0. > >> > >> For updates, the user will pass in the element to add to the map > >> as the value, with a NULL key. For lookups, the user will pass in the > >> element to query in the map as the value, with a NULL key. In the > >> verifier layer, this requires us to modify the argument type of > >> a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; > >> as well, in the syscall layer, we need to copy over the user value > >> so that in bpf_map_peek_elem, we know which specific value to query. > >> > >> A few things to please take note of: > >> * If there are any concurrent lookups + updates, the user is > >> responsible for synchronizing this to ensure no false negative lookups > >> occur. > >> * The number of hashes to use for the bloom filter is configurable > >> from > >> userspace. If no number is specified, the default used will be 5 hash > >> functions. The benchmarks later in this patchset can help compare the > >> performance of using different number of hashes on different entry > >> sizes. In general, using more hashes decreases both the false positive > >> rate and the speed of a lookup. > >> * Deleting an element in the bloom filter map is not supported. > >> * The bloom filter map may be used as an inner map. > >> * The "max_entries" size that is specified at map creation time is > >> used > >> to approximate a reasonable bitmap size for the bloom filter, and is not > >> otherwise strictly enforced. If the user wishes to insert more entries > >> into the bloom filter than "max_entries", they may do so but they should > >> be aware that this may lead to a higher false positive rate. > >> > >> Signed-off-by: Joanne Koong <joannekoong@xxxxxx> > >> --- > > > > > > Apart from few minor comments below and the stuff that Martin > > mentioned, LGTM. > > > > Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > > > >> include/linux/bpf.h | 2 + > >> include/linux/bpf_types.h | 1 + > >> include/uapi/linux/bpf.h | 8 ++ > >> kernel/bpf/Makefile | 2 +- > >> kernel/bpf/bloom_filter.c | 198 +++++++++++++++++++++++++++++++++ > >> kernel/bpf/syscall.c | 19 +++- > >> kernel/bpf/verifier.c | 19 +++- > >> tools/include/uapi/linux/bpf.h | 8 ++ > >> 8 files changed, 250 insertions(+), 7 deletions(-) > >> create mode 100644 kernel/bpf/bloom_filter.c > >> > >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > >> index 31421c74ba08..953d23740ecc 100644 > >> --- a/include/linux/bpf.h > >> +++ b/include/linux/bpf.h > >> @@ -193,6 +193,8 @@ struct bpf_map { > >> struct work_struct work; > >> struct mutex freeze_mutex; > >> u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */ > >> + > >> + u64 map_extra; /* any per-map-type extra fields */ > > > > > > It's minor, but given this is a read-only value, it makes more sense > > to put it after map_flags so that it doesn't share a cache line with a > > refcounting and mutex fields > > > > > Awesome, I will make this change. > > One question I have in general that's semi-related is about > backwards-compatibility. > I might be completely misremembering, but I recall hearing something > about only adding > fields to the end of structs in some headers under the linux/include > directory, so that this > doesn't mess up backwards-compatibility with older kernel versions. > > Is this 100% false or is there a subset under linux/include (like > linux/include/uapi/linux/*) > that we do need to adhere to this for? This backwards compatibility applies only to UAPI types. So any header under include/uapi (i.e., include/uapi/linux/bpf.h which defines "public interface" to BPF). While in this case you are modifying internal kernel types. struct bpf_map is not exposed to user-space, so you can re-shuffle fields if necessary. > > > > > [...] > >