Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes: > On 10/25/21 4:28 PM, Lorenzo Bianconi wrote: >>> Lorenzo noticed that the code testing for program type compatibility of >>> tail call maps is potentially racy in that two threads could encounter a >>> map with an unset type simultaneously and both return true even though they >>> are inserting incompatible programs. >>> >>> The race window is quite small, but artificially enlarging it by adding a >>> usleep_range() inside the check in bpf_prog_array_compatible() makes it >>> trivial to trigger from userspace with a program that does, essentially: >>> >>> map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); >>> pid = fork(); >>> if (pid) { >>> key = 0; >>> value = xdp_fd; >>> } else { >>> key = 1; >>> value = tc_fd; >>> } >>> err = bpf_map_update_elem(map_fd, &key, &value, 0); >>> >>> While the race window is small, it has potentially serious ramifications in >>> that triggering it would allow a BPF program to tail call to a program of a >>> different type. So let's get rid of it by protecting the update with a >>> spinlock. The commit in the Fixes tag is the last commit that touches the >>> code in question. >>> >>> v2: >>> - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) >>> >>> Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") >>> Reported-by: Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> >>> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> >>> --- >>> include/linux/bpf.h | 1 + >>> kernel/bpf/arraymap.c | 1 + >>> kernel/bpf/core.c | 14 ++++++++++---- >>> kernel/bpf/syscall.c | 2 ++ >>> 4 files changed, 14 insertions(+), 4 deletions(-) >>> >>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >>> index 020a7d5bf470..98d906176d89 100644 >>> --- a/include/linux/bpf.h >>> +++ b/include/linux/bpf.h >>> @@ -929,6 +929,7 @@ struct bpf_array_aux { >>> * stored in the map to make sure that all callers and callees have >>> * the same prog type and JITed flag. >>> */ >>> + spinlock_t type_check_lock; >> >> I was wondering if we can use a mutex instead of a spinlock here since it is >> run from a syscall AFAIU. The only downside is mutex_lock is run inside >> aux->used_maps_mutex critical section. Am I missing something? > > Hm, potentially it could work, but then it's also 32 vs 4 extra bytes. There's > also poke_mutex or freeze_mutex, but feels to hacky to 'generalize for reuse', > so I think the spinlock in bpf_array_aux is fine. > >>> enum bpf_prog_type type; >>> bool jited; >>> /* Programs with direct jumps into programs part of this array. */ >>> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c >>> index cebd4fb06d19..da9b1e96cadc 100644 >>> --- a/kernel/bpf/arraymap.c >>> +++ b/kernel/bpf/arraymap.c >>> @@ -1072,6 +1072,7 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr) >>> INIT_WORK(&aux->work, prog_array_map_clear_deferred); >>> INIT_LIST_HEAD(&aux->poke_progs); >>> mutex_init(&aux->poke_mutex); >>> + spin_lock_init(&aux->type_check_lock); > > Just as a tiny nit, I would probably name it slightly different, since type_check_lock > mainly refers to the type property but there's also jit vs non-jit and as pointed out > there could be other extensions that need checking in future as well. Maybe 'compat_lock' > would be a more generic one or just: > > struct { > enum bpf_prog_type type; > bool jited; > spinlock_t lock; > } owner; Uh, I like that! Makes it easier to move as well (which we're doing as part of the xdp_mb series). Will send a v3 with this :) -Toke