Hi Brian, Jiri thanks for the comments. On Mon, Feb 10, 2025 at 8:47 AM Brian Vazquez <brianvv@xxxxxxxxxx> wrote: > > On Mon, Feb 10, 2025 at 4:19 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote: > > > The generic_map_lookup_batch currently returns EINTR if it fails with > > > ENOENT and retries several times on bpf_map_copy_value. The next batch > > > would start from the same location, presuming it's a transient issue. > > > This is incorrect if a map can actually have "holes", i.e. > > > "get_next_key" can return a key that does not point to a valid value. At > > > least the array of maps type may contain such holes legitly. Right now > > > these holes show up, generic batch lookup cannot proceed any more. It > > > will always fail with EINTR errors. > > > > > > Rather, do not retry in generic_map_lookup_batch. If it finds a non > > > existing element, skip to the next key. This simple solution comes with > > > a price that transient errors may not be recovered, and the iteration > > > might cycle back to the first key under parallel deletion. For example, > > > > probably stupid question, but why not keep the retry logic and when > > it fails then instead of returning EINTR just jump to the next key > > > > jirka > > +1, keeping the retry logic but moving to the next key on error sounds > like a sensible approach. > I made the trade off since retry would consistently fail for the array of maps, so it is merely wasting cycles to ever do so. It is already pretty slow to read these maps today from userspace (for us we read them for accounting/monitoring purposes), so it is nice to save a few cycles especially for sparse maps. E.g. We use inner maps to store protocol specific actions in an array of maps with 256 slots, but usually only a few common protocols like TCP/UDP/ICMP are populated, leaving most "holes". On the other hand, I personally feel it is really "fragile" if users rely heavily on this logic to survive concurrent lookup and deletion. Would it make more sense to provide concurrency guarantee with map specific ops like hash map? best Yan