On 10/18/19 9:03 AM, Alexei Starovoitov wrote: > On Fri, Oct 18, 2019 at 02:43:11PM +0100, Lorenz Bauer wrote: >> To iterate a BPF map, userspace must use MAP_GET_NEXT_KEY and provide >> the last retrieved key. The code then scans the hash table bucket >> for the key and returns the key of the next item. >> >> This presents a problem if the last retrieved key isn't present in the >> hash table anymore, e.g. due to concurrent deletion. It's not possible >> to ascertain the location of a key in a given bucket, so there isn't >> really a correct answer. The implementation currently returns the >> first key in the first bucket. This guarantees that we never skip an >> existing key. However, it means that a user space program iterating >> a heavily modified map may never reach the end of the hash table, >> forever restarting at the beginning. >> >> Fixing this outright is rather involved. However, we can improve slightly >> by never revisiting earlier buckets. Instead of the first key in the >> first bucket we return the first key in the "current" bucket. This >> doesn't eliminate the problem, but makes it less likely to occur. >> >> Prior to commit 8fe45924387b ("bpf: map_get_next_key to return first key on NULL") >> passing a non-existent key to MAP_GET_NEXT_KEY was the only way to >> find the first key. Hence there is a small chance that there is code that >> will be broken by this change. > > It is 100% chance that it will break older bcc tools that were written > before NULL was possible argument for get_next_key. The referenced bcc code is in below: https://github.com/iovisor/bcc/blob/master/src/cc/libbpf.c#L301-L330 > Please see Yonghong's patches for batched map lookup. This is my RFC patch. https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@xxxxxx/T/#t I have not got time to finish it as a proper patch set yet. But hopefully soon can find some time to work on this. > That's the proper way to solve your problem. >