On Mon, Feb 10, 2025 at 4:19 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote: > > The generic_map_lookup_batch currently returns EINTR if it fails with > > ENOENT and retries several times on bpf_map_copy_value. The next batch > > would start from the same location, presuming it's a transient issue. > > This is incorrect if a map can actually have "holes", i.e. > > "get_next_key" can return a key that does not point to a valid value. At > > least the array of maps type may contain such holes legitly. Right now > > these holes show up, generic batch lookup cannot proceed any more. It > > will always fail with EINTR errors. > > > > Rather, do not retry in generic_map_lookup_batch. If it finds a non > > existing element, skip to the next key. This simple solution comes with > > a price that transient errors may not be recovered, and the iteration > > might cycle back to the first key under parallel deletion. For example, > > probably stupid question, but why not keep the retry logic and when > it fails then instead of returning EINTR just jump to the next key > > jirka +1, keeping the retry logic but moving to the next key on error sounds like a sensible approach. > > > > Hou Tao <houtao@xxxxxxxxxxxxxxx> pointed out a following scenario: > > > > For LPM trie map: > > (1) ->map_get_next_key(map, prev_key, key) returns a valid key > > > > (2) bpf_map_copy_value() return -ENOMENT > > It means the key must be deleted concurrently. > > > > (3) goto next_key > > It swaps the prev_key and key > > > > (4) ->map_get_next_key(map, prev_key, key) again > > prev_key points to a non-existing key, for LPM trie it will treat just > > like prev_key=NULL case, the returned key will be duplicated. > > > > With the retry logic, the iteration can continue to the key next to the > > deleted one. But if we directly skip to the next key, the iteration loop > > would restart from the first key for the lpm_trie type. > > > > However, not all races may be recovered. For example, if current key is > > deleted after instead of before bpf_map_copy_value, or if the prev_key > > also gets deleted, then the loop will still restart from the first key > > for lpm_tire anyway. For generic lookup it might be better to stay > > simple, i.e. just skip to the next key. To guarantee that the output > > keys are not duplicated, it is better to implement map type specific > > batch operations, which can properly lock the trie and synchronize with > > concurrent mutators. > > > > Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op") > > Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/ > > Signed-off-by: Yan Zhai <yan@xxxxxxxxxxxxxx> > > Acked-by: Hou Tao <houtao1@xxxxxxxxxx> > > --- > > v2->v3: deleted a used macro > > v1->v2: incorporate more useful information inside commit message. > > --- > > kernel/bpf/syscall.c | 18 +++++------------- > > 1 file changed, 5 insertions(+), 13 deletions(-) > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index c420edbfb7c8..e5f1c7fd0ba7 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -1968,8 +1968,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file, > > return err; > > } > > > > -#define MAP_LOOKUP_RETRIES 3 > > - > > int generic_map_lookup_batch(struct bpf_map *map, > > const union bpf_attr *attr, > > union bpf_attr __user *uattr) > > @@ -1979,8 +1977,8 @@ int generic_map_lookup_batch(struct bpf_map *map, > > void __user *values = u64_to_user_ptr(attr->batch.values); > > void __user *keys = u64_to_user_ptr(attr->batch.keys); > > void *buf, *buf_prevkey, *prev_key, *key, *value; > > - int err, retry = MAP_LOOKUP_RETRIES; > > u32 value_size, cp, max_count; > > + int err; > > > > if (attr->batch.elem_flags & ~BPF_F_LOCK) > > return -EINVAL; > > @@ -2026,14 +2024,8 @@ int generic_map_lookup_batch(struct bpf_map *map, > > err = bpf_map_copy_value(map, key, value, > > attr->batch.elem_flags); > > > > - if (err == -ENOENT) { > > - if (retry) { > > - retry--; > > - continue; > > - } > > - err = -EINTR; > > - break; > > - } > > + if (err == -ENOENT) > > + goto next_key; > > > > if (err) > > goto free_buf; > > @@ -2048,12 +2040,12 @@ int generic_map_lookup_batch(struct bpf_map *map, > > goto free_buf; > > } > > > > + cp++; > > +next_key: > > if (!prev_key) > > prev_key = buf_prevkey; > > > > swap(prev_key, key); > > - retry = MAP_LOOKUP_RETRIES; > > - cp++; > > cond_resched(); > > } > > > > -- > > 2.39.5 > > > >