Re: [RFC bpf-next 0/3] bpf: adding map batch processing support

Yonghong Song <yhs@xxxxxx> · Wed, 13 Nov 2019 22:26:01 +0000

On 11/13/19 2:07 PM, Brian Vazquez wrote:
> Hi Yonghong,
> 
> Thanks for reviewing it! I'm preparing the changes and will submit them 
> sooner.
> 
> As for the right way to manage author rights, does anyone know what the 
> correct approach is? Should I use Yonghong's patch and apply the 
> extended support in different patches (i.e. support per_cpu maps, change 
> batch from u64 to __aligned_u64, etc) or it is fine to apply the changes 
> in place and write both sign-offs?

The logic flow of the patch set is most important.
You can add me as co-signoff if you reuse significant portion of my code.

> 
> Thanks,
> Brian
> 
> On Tue, Nov 12, 2019 at 6:34 PM Yonghong Song <yhs@xxxxxx 
> <mailto:yhs@xxxxxx>> wrote:
> 
> 
> 
>     On 11/7/19 1:20 PM, Brian Vazquez wrote:
>      > This is a follow up in the effort to batch bpf map operations to
>     reduce
>      > the syscall overhead with the map_ops. I initially proposed the
>     idea and
>      > the discussion is here:
>      >
>     https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@xxxxxxxxxx/
>      >
>      > Yonghong talked at the LPC about this and also proposed and idea that
>      > handles the special weird case of hashtables by doing traversing
>     using
>      > the buckets as a reference instead of a key. Discussion is here:
>      > https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@xxxxxx/
>      >
>      > This RFC proposes a way to extend batch operations for more data
>      > structures by creating generic batch functions that can be used
>     instead
>      > of implementing the operations for each individual data structure,
>      > reducing the code that needs to be maintained. The series
>     contains the
>      > patches used in Yonghong's RFC and the patch that adds the generic
>      > implementation of the operations plus some testing with pcpu hashmaps
>      > and arrays. Note that pcpu hashmap shouldn't use the generic
>      > implementation and it either should have its own implementation
>     or share
>      > the one introduced by Yonghong, I added that just as an example
>     to show
>      > that the generic implementation can be easily added to a data
>     structure.
>      >
>      > What I want to achieve with this RFC is to collect early feedback
>     and see if
>      > there's any major concern about this before I move forward. I do plan
>      > to better separate this into different patches and explain them
>     properly
>      > in the commit messages.
> 
>     Thanks Brian for working on this. The general approach described here
>     is good to me. Having a generic implementation for batch operations
>     looks good for maps (not hash table, queue/stack, etc.)
> 
>      >
>      > Current known issues where I would like to discuss are the
>     followings:
>      >
>      > - Because Yonghong's UAPI definition was done specifically for
>      >    iterating buckets, the batch field is u64 and is treated as an u64
>      >    instead of an opaque pointer, this won't work for other data
>     structures
>      >    that are going to use a key as a batch token with a size
>     greater than
>      >    64. Although I think at this point the only key that couldn't be
>      >    treated as a u64 is the key of a hashmap, and the hashmap
>     won't use
>      >    the generic interface.
> 
>     The u64 can be changed with a __aligned_u64 opaque value. This way,
>     it can represent a pointer or a 64bit value.
> 
>      > - Not all the data structures use delete (because it's not a valid
>      >    operation) i.e. arrays. So maybe lookup_and_delete_batch
>     command is
>      >    not needed and we can handle that operation with a
>     lookup_batch and a
>      >    flag.
> 
>     This make sense.
> 
>      > - For delete_batch (not just the lookup_and_delete_batch). Is this
>      >    operation really needed? If so, shouldn't it be better if the
>      >    behaviour is delete the keys provided? I did that with my generic
>      >    implementation but Yonghong's delete_batch for a hashmap deletes
>      >    buckets.
> 
>     We need batched delete in bcc. lookup_and_delete_batch is better as
>     it can preserves more new map entries. Alternatively, deleting
>     all entries after lookup is another option. But this may remove
>     more new map entries. Statistically this may or may not matter though.
> 
>     bcc does have a clear_table (clear_map) API, but not clear who is
>     using it.
> 
>     So, I did not have a concrete use case for delete_batch yet.
>     I tend to think we should have delete_batch for API completeness,
>     but maybe other people can comment on this as well.
> 
>     Maybe initial patch, we can skip it. But we should still ensure
>     user interface data structure can handle batch delete if it is
>     needed later. The current data structure should handle this
>     as far as I know.
> 
>      >
>      > Brian Vazquez (1):
>      >    bpf: add generic batch support
>      >
>      > Yonghong Song (2):
>      >    bpf: adding map batch processing support
>      >    tools/bpf: test bpf_map_lookup_and_delete_batch()
>      >
>      >   include/linux/bpf.h                           |  21 +
>      >   include/uapi/linux/bpf.h                      |  22 +
>      >   kernel/bpf/arraymap.c                         |   4 +
>      >   kernel/bpf/hashtab.c                          | 331 ++++++++++
>      >   kernel/bpf/syscall.c                          | 573
>     ++++++++++++++----
>      >   tools/include/uapi/linux/bpf.h                |  22 +
>      >   tools/lib/bpf/bpf.c                           |  59 ++
>      >   tools/lib/bpf/bpf.h                           |  13 +
>      >   tools/lib/bpf/libbpf.map                      |   4 +
>      >   .../map_tests/map_lookup_and_delete_batch.c   | 245 ++++++++
>      >   .../map_lookup_and_delete_batch_array.c       | 118 ++++
>      >   11 files changed, 1292 insertions(+), 120 deletions(-)
>      >   create mode 100644
>     tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch.c
>      >   create mode 100644
>     tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
>      >
>