Re: [RFC PATCH v3 0/3] Introduce BPF map tracing capability

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 2 Nov 2021 17:12:45 -0700

On Tue, Nov 02, 2021 at 02:14:29AM +0000, Joe Burton wrote:
> From: Joe Burton <jevburton@xxxxxxxxxx>
> 
> This is the third version of a patch series implementing map tracing.
> 
> Map tracing enables executing BPF programs upon BPF map updates. This
> might be useful to perform upgrades of stateful programs; e.g., tracing
> programs can propagate changes to maps that occur during an upgrade
> operation.
> 
> This version uses trampoline hooks to provide the capability.
> fentry/fexit/fmod_ret programs can attach to two new functions:
>         int bpf_map_trace_update_elem(struct bpf_map* map, void* key,
>                 void* val, u32 flags);
>         int bpf_map_trace_delete_elem(struct bpf_map* map, void* key);
> 
> These hooks work as intended for the following map types:
>         BPF_MAP_TYPE_ARRAY
>         BPF_MAP_TYPE_PERCPU_ARRAY
>         BPF_MAP_TYPE_HASH
>         BPF_MAP_TYPE_PERCPU_HASH
>         BPF_MAP_TYPE_LRU_HASH
>         BPF_MAP_TYPE_LRU_PERCPU_HASH
> 
> The only guarantee about the semantics of these hooks is that they execute
> before the operation takes place. We cannot call them with locks held
> because the hooked program might try to acquire the same locks. Thus they
> may be invoked in situations where the traced map is not ultimately
> updated.
> 
> The original proposal suggested exposing a function for each
> (map type) x (access type). The problem I encountered is that e.g.
> percpu hashtables use a custom function for some access types
> (htab_percpu_map_update_elem) but a common function for others
> (htab_map_delete_elem). Thus a userspace application would have to
> maintain a unique list of functions to attach to for each map type;
> moreover, this list could change across kernel versions. Map tracing is
> easier to use with fewer functions, at the cost of tracing programs
> being triggered more times.

Good point about htab_percpu.
The patches look good to me.
Few minor bits:
- pls don't use #pragma once.
  There was a discussion not too long ago about it and the conclusion
  was that let's not use it.
  It slipped into few selftest/bpf, but let's not introduce more users.
- noinline is not needed in prototype.
- bpf_probe_read is deprecated. Pls use bpf_probe_read_kernel.

and thanks for detailed patch 3.

> To prevent the compiler from optimizing out the calls to my tracing
> functions, I use the asm("") trick described in gcc's
> __attribute__((noinline)) documentation. Experimentally, this trick
> works with clang as well.

I think noinline is enough. I don't think you need that asm in there.

In parallel let's figure out how to do:
SEC("fentry/bpf_map_trace_update_elem")
int BPF_PROG(copy_on_write__update,
             struct bpf_map *map,
             struct allow_reads_key__old *key,
             void *value, u64 map_flags)

It kinda sucks that bpf_probe_read_kernel is necessary to read key/values.
It would be much nicer to be able to specify the exact struct for the key and
access it directly.
The verifier does this already for map iterator.
It's 'void *' on the kernel side while iterator prog can cast this pointer
to specific 'struct key *' and access it directly.
See bpf_iter_reg->ctx_arg_info and btf_ctx_access().

For fentry into bpf_map_trace_update_elem it's a bit more challenging,
since it will be called for all maps and there is no way to statically
check that specific_map->key_size is within prog->aux->max_rdonly_access.

May be we can do a dynamic cast helper (simlar to those that cast sockets)
that will check for key_size at run-time?
Another alternative is to allow 'void *' -> PTR_TO_BTF_ID conversion
and let inlined probe_read do the job.