To iterate a BPF map, userspace must use MAP_GET_NEXT_KEY and provide the last retrieved key. The code then scans the hash table bucket for the key and returns the key of the next item. This presents a problem if the last retrieved key isn't present in the hash table anymore, e.g. due to concurrent deletion. It's not possible to ascertain the location of a key in a given bucket, so there isn't really a correct answer. The implementation currently returns the first key in the first bucket. This guarantees that we never skip an existing key. However, it means that a user space program iterating a heavily modified map may never reach the end of the hash table, forever restarting at the beginning. Fixing this outright is rather involved. However, we can improve slightly by never revisiting earlier buckets. Instead of the first key in the first bucket we return the first key in the "current" bucket. This doesn't eliminate the problem, but makes it less likely to occur. Prior to commit 8fe45924387b ("bpf: map_get_next_key to return first key on NULL") passing a non-existent key to MAP_GET_NEXT_KEY was the only way to find the first key. Hence there is a small chance that there is code that will be broken by this change. Fixes: 8fe45924387b ("bpf: map_get_next_key to return first key on NULL") Signed-off-by: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> --- kernel/bpf/hashtab.c | 4 +++- tools/testing/selftests/bpf/test_maps.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 22066a62c8c9..30f0dab488f0 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -613,6 +613,9 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) head = select_bucket(htab, hash); + /* don't iterate previous buckets */ + i = hash & (htab->n_buckets - 1); + /* lookup the key */ l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets); @@ -630,7 +633,6 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) } /* no more elements in this hash list, go to the next bucket */ - i = hash & (htab->n_buckets - 1); i++; find_first_elem: diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index 806b298397d3..6f351e532ddc 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -100,7 +100,7 @@ static void test_hashmap(unsigned int task, void *data) assert(bpf_map_get_next_key(fd, NULL, &first_key) == 0 && (first_key == 1 || first_key == 2)); assert(bpf_map_get_next_key(fd, &key, &next_key) == 0 && - (next_key == first_key)); + (next_key == 1 || next_key == 2)); assert(bpf_map_get_next_key(fd, &next_key, &next_key) == 0 && (next_key == 1 || next_key == 2) && (next_key != first_key)); -- 2.20.1