On Wed, Sep 01, 2021 at 10:11:20PM -0700, John Fastabend wrote: [ ... ] > > > +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr) > > > +{ > > > + int numa_node = bpf_map_attr_numa_node(attr); > > > + u32 nr_bits, bit_array_bytes, bit_array_mask; > > > + struct bpf_bloom_filter *bloom_filter; > > > + > > > + if (!bpf_capable()) > > > + return ERR_PTR(-EPERM); > > > + > > > + if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 || > > > + attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK || > > > + !bpf_map_flags_access_ok(attr->map_flags)) > > > + return ERR_PTR(-EINVAL); > > > + > > > + /* For the bloom filter, the optimal bit array size that minimizes the > > > + * false positive probability is n * k / ln(2) where n is the number of > > > + * expected entries in the bloom filter and k is the number of hash > > > + * functions. We use 7 / 5 to approximate 1 / ln(2). > > > + * > > > + * We round this up to the nearest power of two to enable more efficient > > > + * hashing using bitmasks. The bitmask will be the bit array size - 1. > > > + * > > > + * If this overflows a u32, the bit array size will have 2^32 (4 > > > + * GB) bits. > > Would it be better to return E2BIG or EINVAL here? Speculating a bit, but if I was > a user I might want to know that the number of bits I pushed down is not the actual > number? > > Another thought, would it be simpler to let user do this calculation and just let > max_elements be number of bits they want? Then we could have examples with the > above comment. Just a thought... Instead of having user second guessing on what max_entries means for a particular map, I think it is better to keep max_entries meaning as consistent as possible and let the kernel figure out the correct nr_bits to use. [ ... ] > > > +static int bloom_filter_map_push_elem(struct bpf_map *map, void *value, > > > + u64 flags) > > > +{ > > > + struct bpf_bloom_filter *bloom_filter = > > > + container_of(map, struct bpf_bloom_filter, map); > > > + unsigned long spinlock_flags; > > > + u32 i, hash; > > > + > > > + if (flags != BPF_ANY) > > > + return -EINVAL; > > > + > > > + spin_lock_irqsave(&bloom_filter->spinlock, spinlock_flags); > > > + > > > > If value_size is pretty big, hashing might take a noticeable amount of > > CPU, during which we'll be keeping spinlock. With what I said above Good catch on big value_size. > > about sane number of hashes, if we bound it to small reasonable number > > (e.g., 16), we can have a local 16-element array with hashes > > calculated before we take lock. That way spinlock will be held only > > for few bit flips. > > +1. Anyways we are inside a RCU section here and the map shouldn't > disapper without a grace period so we can READ_ONCE the seed right? > Or are we thinking about sleepable programs here? > > > Also, I wonder if ditching spinlock in favor of atomic bit set > > operation would improve performance in typical scenarios. Seems like > > set_bit() is an atomic operation, so it should be easy to test. Do you > > mind running benchmarks with spinlock and with set_bit()? The atomic set_bit() is a good idea. Then no need to have a 16-element array and keep thing simple. It is in general useful to optimize the update/push path (e.g. I would like to have the bloom-filter bench populating millions entries faster). Our current usecase is to have the userspace populates the map (e.g. a lot of suspicious IP that we have already learned) at the very beginning and then very sparse update after that. The bpf prog will mostly only lookup/peek which I think is a better optimization and benchmark target. > > With the jhash pulled out of lock, I think it might be noticable. Curious > to see. > > > > > > + for (i = 0; i < bloom_filter->map.nr_hashes; i++) { > > > + hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) & > > > + bloom_filter->bit_array_mask; > > > + bitmap_set(bloom_filter->bit_array, hash, 1); > > > + } > > > + > > > + spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags); > > > + > > > + return 0; > > > +} > > > + > > > > [...] > >