Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joanne Koong <joannekoong@xxxxxx> writes:

> On 10/7/21 7:20 AM, Toke Høiland-Jørgensen wrote:
>
>> Joanne Koong <joannekoong@xxxxxx> writes:
>>
>>> This patch adds the kernel-side changes for the implementation of
>>> a bitset map with bloom filter capabilities.
>>>
>>> The bitset map does not have keys, only values since it is a
>>> non-associative data type. When the bitset map is created, it must
>>> be created with a key_size of 0, and the max_entries value should be the
>>> desired size of the bitset, in number of bits.
>>>
>>> The bitset map supports peek (determining whether a bit is set in the
>>> map), push (setting a bit in the map), and pop (clearing a bit in the
>>> map) operations. These operations are exposed to userspace applications
>>> through the already existing syscalls in the following way:
>>>
>>> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
>>> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
>>> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
>>>
>>> For updates, the user will pass in a NULL key and the index of the
>>> bit to set in the bitmap as the value. For lookups, the user will pass
>>> in the index of the bit to check as the value. If the bit is set, 0
>>> will be returned, else -ENOENT. For clearing the bit, the user will pass
>>> in the index of the bit to clear as the value.
>> This is interesting, and I can see other uses of such a data structure.
>> However, a couple of questions (talking mostly about the 'raw' bitmap
>> without the bloom filter enabled):
>>
>> - How are you envisioning synchronisation to work? The code is using the
>>    atomic set_bit() operation, but there's no test_and_{set,clear}_bit().
>>    Any thoughts on how users would do this?
> I was thinking for users who are doing concurrent lookups + updates,
> they are responsible for synchronizing the operations through mutexes.
> Do you think this makes sense / is reasonable?

Right, that is an option, of course, but it's a bit heavyweight. Atomic
bitops are a nice light-weight synchronisation primitive.

Hmm, looking at your code again, you're already using
test_and_clear_bit() in pop_elem(). So why not just mirror that to
test_and_set_bit() in push_elem(), and change the returns to EEXIST and
ENOENT if trying to set or clear a bit that is already set or cleared
(respectively)?

>> - It would be useful to expose the "find first set (ffs)" operation of
>>    the bitmap as well. This can be added later, but thinking about the
>>    API from the start would be good to avoid having to add a whole
>>    separate helper for this. My immediate thought is to reserve peek(-1)
>>    for this use - WDYT?
> I think using peek(-1) for "find first set" sounds like a great idea!

Awesome!

>> - Any thoughts on inlining the lookups? This should at least be feasible
>>    for the non-bloom-filter type, but I'm not quite sure if the use of
>>    map_extra allows the verifier to distinguish between the map types
>>    (I'm a little fuzzy on how the inlining actually works)? And can
>>    peek()/push()/pop() be inlined at all?
>
> I am not too familiar with how bpf instructions and inlining works, but
> from a first glance, this looks doable for both the non-bloom filter
> and bloom filter cases. From my cursory understanding of how it works,
> it seems like we could have something like "bitset_map_gen_lookup" where
> we parse the bpf_map->map_extra to see if the bloom filter is enabled;
> if it is, we could call the hash function directly to compute which bit 
> to look up,
> and then use the same insn logic for looking up the bit in both cases
> (the bitmap w/ and w/out the bloom filter).
>
> I don't think there is support yet in the verifier for inlining
> peek()/push()/pop(), but it seems like this should be doable as well.
>
> I think these changes would maybe warrant a separate patchset
> on top of this one. What are your thoughts?

Ah yes, I think you're right, this should be possible to add later. I'm
fine with deferring that to a separate series, then :)

-Toke





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux