Andrey Ignatov wrote: > There are multiple use-cases when it's convenient to have access to bpf > map fields, both `struct bpf_map` and map type specific struct-s such as > `struct bpf_array`, `struct bpf_htab`, etc. > > For example while working with sock arrays it can be necessary to > calculate the key based on map->max_entries (some_hash % max_entries). > Currently this is solved by communicating max_entries via "out-of-band" > channel, e.g. via additional map with known key to get info about target > map. That works, but is not very convenient and error-prone while > working with many maps. > > In other cases necessary data is dynamic (i.e. unknown at loading time) > and it's impossible to get it at all. For example while working with a > hash table it can be convenient to know how much capacity is already > used (bpf_htab.count.counter for BPF_F_NO_PREALLOC case). > > At the same time kernel knows this info and can provide it to bpf > program. > > Fill this gap by adding support to access bpf map fields from bpf > program for both `struct bpf_map` and map type specific fields. > > Support is implemented via btf_struct_access() so that a user can define > their own `struct bpf_map` or map type specific struct in their program > with only necessary fields and preserve_access_index attribute, cast a > map to this struct and use a field. > > For example: > > struct bpf_map { > __u32 max_entries; > } __attribute__((preserve_access_index)); > > struct bpf_array { > struct bpf_map map; > __u32 elem_size; > } __attribute__((preserve_access_index)); > > struct { > __uint(type, BPF_MAP_TYPE_ARRAY); > __uint(max_entries, 4); > __type(key, __u32); > __type(value, __u32); > } m_array SEC(".maps"); > > SEC("cgroup_skb/egress") > int cg_skb(void *ctx) > { > struct bpf_array *array = (struct bpf_array *)&m_array; > struct bpf_map *map = (struct bpf_map *)&m_array; > > /* .. use map->max_entries or array->map.max_entries .. */ > } > > Similarly to other btf_struct_access() use-cases (e.g. struct tcp_sock > in net/ipv4/bpf_tcp_ca.c) the patch allows access to any fields of > corresponding struct. Only reading from map fields is supported. > > For btf_struct_access() to work there should be a way to know btf id of > a struct that corresponds to a map type. To get btf id there should be a > way to get a stringified name of map-specific struct, such as > "bpf_array", "bpf_htab", etc for a map type. Two new fields are added to > `struct bpf_map_ops` to handle it: > * .map_btf_name keeps a btf name of a struct returned by map_alloc(); > * .map_btf_id is used to cache btf id of that struct. > > To make btf ids calculation cheaper they're calculated once while > preparing btf_vmlinux and cached same way as it's done for btf_id field > of `struct bpf_func_proto` > > While calculating btf ids, struct names are NOT checked for collision. > Collisions will be checked as a part of the work to prepare btf ids used > in verifier in compile time that should land soon. The only known > collision for `struct bpf_htab` (kernel/bpf/hashtab.c vs > net/core/sock_map.c) was fixed earlier. > > Both new fields .map_btf_name and .map_btf_id must be set for a map type > for the feature to work. If neither is set for a map type, verifier will > return ENOTSUPP on a try to access map_ptr of corresponding type. If > just one of them set, it's verifier misconfiguration. > > Only `struct bpf_array` for BPF_MAP_TYPE_ARRAY and `struct bpf_htab` for > BPF_MAP_TYPE_HASH are supported by this patch. Other map types will be > supported separately. > > The feature is available only for CONFIG_DEBUG_INFO_BTF=y and gated by > perfmon_capable() so that unpriv programs won't have access to bpf map > fields. > > Signed-off-by: Andrey Ignatov <rdna@xxxxxx> > --- > include/linux/bpf.h | 9 ++ > include/linux/bpf_verifier.h | 1 + > kernel/bpf/arraymap.c | 3 + > kernel/bpf/btf.c | 40 +++++++++ > kernel/bpf/hashtab.c | 3 + > kernel/bpf/verifier.c | 82 +++++++++++++++++-- > .../selftests/bpf/verifier/map_ptr_mixing.c | 2 +- > 7 files changed, 131 insertions(+), 9 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 07052d44bca1..1e1501ee53ce 100644 LGTM, but any reason not to allow this with bpf_capable() it looks useful for building load balancers which might not be related to CAP_PERFMON. Otherwise, Acked-by: John Fastabend <john.fastabend@xxxxxxxxx>