On Fri, Oct 18, 2019 at 02:43:11PM +0100, Lorenz Bauer wrote: > To iterate a BPF map, userspace must use MAP_GET_NEXT_KEY and provide > the last retrieved key. The code then scans the hash table bucket > for the key and returns the key of the next item. > > This presents a problem if the last retrieved key isn't present in the > hash table anymore, e.g. due to concurrent deletion. It's not possible > to ascertain the location of a key in a given bucket, so there isn't > really a correct answer. The implementation currently returns the > first key in the first bucket. This guarantees that we never skip an > existing key. However, it means that a user space program iterating > a heavily modified map may never reach the end of the hash table, > forever restarting at the beginning. > > Fixing this outright is rather involved. However, we can improve slightly > by never revisiting earlier buckets. Instead of the first key in the > first bucket we return the first key in the "current" bucket. This > doesn't eliminate the problem, but makes it less likely to occur. > > Prior to commit 8fe45924387b ("bpf: map_get_next_key to return first key on NULL") > passing a non-existent key to MAP_GET_NEXT_KEY was the only way to > find the first key. Hence there is a small chance that there is code that > will be broken by this change. It is 100% chance that it will break older bcc tools that were written before NULL was possible argument for get_next_key. Please see Yonghong's patches for batched map lookup. That's the proper way to solve your problem.