On Fri, Nov 22, 2019 at 12:08 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > This work adds program tracking to prog array maps. This is needed such > that upon prog array updates/deletions we can fix up all programs which > make use of this tail call map. We add ops->map_poke_{un,}track() > helpers to maps to maintain the list of programs and ops->map_poke_run() > for triggering the actual update. > > bpf_array_aux is extended to contain the list head and poke_mutex in > order to serialize program patching during updates/deletions. > bpf_free_used_maps() will untrack the program shortly before dropping > the reference to the map. For clearing out the prog array once all urefs > are dropped we need to use schedule_work() to have a sleepable context. > > The prog_array_map_poke_run() is triggered during updates/deletions and > walks the maintained prog list. It checks in their poke_tabs whether the > map and key is matching and runs the actual bpf_arch_text_poke() for > patching in the nop or new jmp location. Depending on the type of update, > we use one of BPF_MOD_{NOP_TO_JUMP,JUMP_TO_NOP,JUMP_TO_JUMP}. > > Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > --- > include/linux/bpf.h | 12 +++ > kernel/bpf/arraymap.c | 183 +++++++++++++++++++++++++++++++++++++++++- > kernel/bpf/core.c | 9 ++- > kernel/bpf/syscall.c | 20 +++-- > 4 files changed, 212 insertions(+), 12 deletions(-) > [...] > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 5a9873e58a01..bb002f15b32a 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -26,12 +26,13 @@ > #include <linux/audit.h> > #include <uapi/linux/btf.h> > > -#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \ > - (map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ > - (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ > - (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) > +#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ > + (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ > + (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) > +#define IS_FD_PROG_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY) > #define IS_FD_HASH(map) ((map)->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) > -#define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_HASH(map)) > +#define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map) || \ > + IS_FD_HASH(map)) > > #define BPF_OBJ_FLAG_MASK (BPF_F_RDONLY | BPF_F_WRONLY) > > @@ -878,7 +879,7 @@ static int map_lookup_elem(union bpf_attr *attr) > err = bpf_percpu_cgroup_storage_copy(map, key, value); > } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) { > err = bpf_stackmap_copy(map, key, value); > - } else if (IS_FD_ARRAY(map)) { > + } else if (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map)) { Why BPF_MAP_TYPE_PROG_ARRAY couldn't still stay as "IS_FD_ARRAY"? Seems like it's still handled the same here and is technically an array containing FDs, no? You can still have more precise IS_FD_PROG_ARRAY, for cases like in map_update_elem(), where you need to special-handle just that map type. > err = bpf_fd_array_map_lookup_elem(map, key, value); > } else if (IS_FD_HASH(map)) { > err = bpf_fd_htab_map_lookup_elem(map, key, value); > @@ -1005,6 +1006,10 @@ static int map_update_elem(union bpf_attr *attr) > map->map_type == BPF_MAP_TYPE_SOCKMAP) { > err = map->ops->map_update_elem(map, key, value, attr->flags); > goto out; > + } else if (IS_FD_PROG_ARRAY(map)) { > + err = bpf_fd_array_map_update_elem(map, f.file, key, value, > + attr->flags); > + goto out; > } > > /* must increment bpf_prog_active to avoid kprobe+bpf triggering from > @@ -1087,6 +1092,9 @@ static int map_delete_elem(union bpf_attr *attr) > if (bpf_map_is_dev_bound(map)) { > err = bpf_map_offload_delete_elem(map, key); > goto out; > + } else if (IS_FD_PROG_ARRAY(map)) { > + err = map->ops->map_delete_elem(map, key); map->ops->map_delete_elem would be called below anyways, except under rcu_read_lock() with preempt_disable() (maybe_wait_bpf_programs() is no-op for prog_array). So if there is specific reason we want to avoid preempt_disable and rcu_read_lock(), would be nice to have a comment explaining that. > + goto out; > } > > preempt_disable(); > -- > 2.21.0 >