On Fri, Nov 22, 2019 at 3:06 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > On 11/22/19 11:55 PM, Andrii Nakryiko wrote: > > On Fri, Nov 22, 2019 at 12:08 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > >> > >> This work adds program tracking to prog array maps. This is needed such > >> that upon prog array updates/deletions we can fix up all programs which > >> make use of this tail call map. We add ops->map_poke_{un,}track() > >> helpers to maps to maintain the list of programs and ops->map_poke_run() > >> for triggering the actual update. > >> > >> bpf_array_aux is extended to contain the list head and poke_mutex in > >> order to serialize program patching during updates/deletions. > >> bpf_free_used_maps() will untrack the program shortly before dropping > >> the reference to the map. For clearing out the prog array once all urefs > >> are dropped we need to use schedule_work() to have a sleepable context. > >> > >> The prog_array_map_poke_run() is triggered during updates/deletions and > >> walks the maintained prog list. It checks in their poke_tabs whether the > >> map and key is matching and runs the actual bpf_arch_text_poke() for > >> patching in the nop or new jmp location. Depending on the type of update, > >> we use one of BPF_MOD_{NOP_TO_JUMP,JUMP_TO_NOP,JUMP_TO_JUMP}. > >> > >> Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > >> --- > >> include/linux/bpf.h | 12 +++ > >> kernel/bpf/arraymap.c | 183 +++++++++++++++++++++++++++++++++++++++++- > >> kernel/bpf/core.c | 9 ++- > >> kernel/bpf/syscall.c | 20 +++-- > >> 4 files changed, 212 insertions(+), 12 deletions(-) > >> > > > > [...] > > > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > >> index 5a9873e58a01..bb002f15b32a 100644 > >> --- a/kernel/bpf/syscall.c > >> +++ b/kernel/bpf/syscall.c > >> @@ -26,12 +26,13 @@ > >> #include <linux/audit.h> > >> #include <uapi/linux/btf.h> > >> > >> -#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \ > >> - (map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ > >> - (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ > >> - (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) > >> +#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ > >> + (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ > >> + (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) > >> +#define IS_FD_PROG_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY) > >> #define IS_FD_HASH(map) ((map)->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) > >> -#define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_HASH(map)) > >> +#define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map) || \ > >> + IS_FD_HASH(map)) > >> > >> #define BPF_OBJ_FLAG_MASK (BPF_F_RDONLY | BPF_F_WRONLY) > >> > >> @@ -878,7 +879,7 @@ static int map_lookup_elem(union bpf_attr *attr) > >> err = bpf_percpu_cgroup_storage_copy(map, key, value); > >> } else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) { > >> err = bpf_stackmap_copy(map, key, value); > >> - } else if (IS_FD_ARRAY(map)) { > >> + } else if (IS_FD_ARRAY(map) || IS_FD_PROG_ARRAY(map)) { > > > > Why BPF_MAP_TYPE_PROG_ARRAY couldn't still stay as "IS_FD_ARRAY"? > > Seems like it's still handled the same here and is technically an > > array containing FDs, no? You can still have more precise > > For the update and delete method we need to hold the mutex in prog array > which is why we cannot be under RCU there. For the lookup itself it can > stay as-is since it's not a modification hence the or above. > > > IS_FD_PROG_ARRAY, for cases like in map_update_elem(), where you need > > to special-handle just that map type. > > > >> err = bpf_fd_array_map_lookup_elem(map, key, value); > >> } else if (IS_FD_HASH(map)) { > >> err = bpf_fd_htab_map_lookup_elem(map, key, value); > >> @@ -1005,6 +1006,10 @@ static int map_update_elem(union bpf_attr *attr) > >> map->map_type == BPF_MAP_TYPE_SOCKMAP) { > >> err = map->ops->map_update_elem(map, key, value, attr->flags); > >> goto out; > >> + } else if (IS_FD_PROG_ARRAY(map)) { > >> + err = bpf_fd_array_map_update_elem(map, f.file, key, value, > >> + attr->flags); > >> + goto out; > >> } > >> > >> /* must increment bpf_prog_active to avoid kprobe+bpf triggering from > >> @@ -1087,6 +1092,9 @@ static int map_delete_elem(union bpf_attr *attr) > >> if (bpf_map_is_dev_bound(map)) { > >> err = bpf_map_offload_delete_elem(map, key); > >> goto out; > >> + } else if (IS_FD_PROG_ARRAY(map)) { > >> + err = map->ops->map_delete_elem(map, key); > > > > map->ops->map_delete_elem would be called below anyways, except under > > rcu_read_lock() with preempt_disable() (maybe_wait_bpf_programs() is > > no-op for prog_array). So if there is specific reason we want to avoid > > preempt_disable and rcu_read_lock(), would be nice to have a comment > > explaining that. > > See answer above. I didn't add an explicit comment there since it's not done > for all the others either, but seems obvious when looking at the map implementation > and prog_array_map_poke_run() / bpf_arch_text_poke(). Ok, fair enough. I was a bit lazy in trying to figure this out :) Acked-by: Andrii Nakryiko <andriin@xxxxxx> > > Thanks, > Daniel