Re: [PATCH v5 bpf-next 1/5] bpf: Add bloom filter map implementation

Andrii Nakryiko <andriin@xxxxxx> · Tue, 26 Oct 2021 20:18:43 -0700

On 10/22/21 3:02 PM, Joanne Koong wrote:
This patch adds the kernel-side changes for the implementation of
a bpf bloom filter map.

The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:

BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push

The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.

For updates, the user will pass in the element to add to the map
as the value, with a NULL key. For lookups, the user will pass in the
element to query in the map as the value, with a NULL key. In the
verifier layer, this requires us to modify the argument type of
a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
as well, in the syscall layer, we need to copy over the user value
so that in bpf_map_peek_elem, we know which specific value to query.

A few things to please take note of:
  * If there are any concurrent lookups + updates, the user is
responsible for synchronizing this to ensure no false negative lookups
occur.
  * The number of hashes to use for the bloom filter is configurable from
userspace. If no number is specified, the default used will be 5 hash
functions. The benchmarks later in this patchset can help compare the
performance of using different number of hashes on different entry
sizes. In general, using more hashes decreases both the false positive
rate and the speed of a lookup.
  * Deleting an element in the bloom filter map is not supported.
  * The bloom filter map may be used as an inner map.
  * The "max_entries" size that is specified at map creation time is used
to approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries
into the bloom filter than "max_entries", they may do so but they should
be aware that this may lead to a higher false positive rate.

Signed-off-by: Joanne Koong <joannekoong@xxxxxx>
---


Apart from few minor comments below and the stuff that Martin mentioned, 
LGTM.

Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>


  include/linux/bpf.h            |   2 +
  include/linux/bpf_types.h      |   1 +
  include/uapi/linux/bpf.h       |   8 ++
  kernel/bpf/Makefile            |   2 +-
  kernel/bpf/bloom_filter.c      | 198 +++++++++++++++++++++++++++++++++
  kernel/bpf/syscall.c           |  19 +++-
  kernel/bpf/verifier.c          |  19 +++-
  tools/include/uapi/linux/bpf.h |   8 ++
  8 files changed, 250 insertions(+), 7 deletions(-)
  create mode 100644 kernel/bpf/bloom_filter.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 31421c74ba08..953d23740ecc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -193,6 +193,8 @@ struct bpf_map {
  	struct work_struct work;
  	struct mutex freeze_mutex;
  	u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
+
+	u64 map_extra; /* any per-map-type extra fields */


It's minor, but given this is a read-only value, it makes more sense to 
put it after map_flags so that it doesn't share a cache line with a 
refcounting and mutex fields


  };
  
  static inline bool map_value_has_spin_lock(const struct bpf_map *map)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9c81724e4b98..c4424ac2fa02 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
  #endif
  BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
  
  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c10820037883..66827b93f548 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@ enum bpf_map_type {
  	BPF_MAP_TYPE_RINGBUF,
  	BPF_MAP_TYPE_INODE_STORAGE,
  	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BLOOM_FILTER,
  };
  
  /* Note that tracing related programs such as
@@ -1252,6 +1253,12 @@ struct bpf_stack_build_id {
  
  #define BPF_OBJ_NAME_LEN 16U
  
+/* map_extra flags
+ *
+ * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the number of hash
+ * functions (if 0, the bloom filter will default to using 5 hash functions).
+ */
+


This comment makes more sense right before map_extra field below. I 
noticed it only accidentally.


  union bpf_attr {
  	struct { /* anonymous struct used by BPF_MAP_CREATE command */
  		__u32	map_type;	/* one of enum bpf_map_type */
@@ -1274,6 +1281,7 @@ union bpf_attr {
  						   * struct stored as the
  						   * map value
  						   */
+		__u64	map_extra;	/* any per-map-type extra fields */
  	};
  
  	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */


[...]