Re: [PATCH v2 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 6, 2025 at 9:45 PM Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
>
> The generic_map_lookup_batch currently returns EINTR if it fails with
> ENOENT and retries several times on bpf_map_copy_value. The next batch
> would start from the same location, presuming it's a transient issue.
> This is incorrect if a map can actually have "holes", i.e.
> "get_next_key" can return a key that does not point to a valid value. At
> least the array of maps type may contain such holes legitly. Right now
> these holes show up, generic batch lookup cannot proceed any more. It
> will always fail with EINTR errors.
>
> Rather, do not retry in generic_map_lookup_batch. If it finds a non
> existing element, skip to the next key. This simple solution comes with
> a price that transient errors may not be recovered, and the iteration
> might cycle back to the first key under parallel deletion. For example,
> Hou Tao <houtao@xxxxxxxxxxxxxxx> pointed out a following scenario:
>
> For LPM trie map:
> (1) ->map_get_next_key(map, prev_key, key) returns a valid key
>
> (2) bpf_map_copy_value() return -ENOMENT
> It means the key must be deleted concurrently.
>
> (3) goto next_key
> It swaps the prev_key and key
>
> (4) ->map_get_next_key(map, prev_key, key) again
> prev_key points to a non-existing key, for LPM trie it will treat just
> like prev_key=NULL case, the returned key will be duplicated.
>
> With the retry logic, the iteration can continue to the key next to the
> deleted one. But if we directly skip to the next key, the iteration loop
> would restart from the first key for the lpm_trie type.
>
> However, not all races may be recovered. For example, if current key is
> deleted after instead of before bpf_map_copy_value, or if the prev_key
> also gets deleted, then the loop will still restart from the first key
> for lpm_tire anyway. For generic lookup it might be better to stay
> simple, i.e. just skip to the next key. To guarantee that the output
> keys are not duplicated, it is better to implement map type specific
> batch operations, which can properly lock the trie and synchronize with
> concurrent mutators.
>
> Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
> Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
> Signed-off-by: Yan Zhai <yan@xxxxxxxxxxxxxx>
> ---
> v1->v2: incorporate more useful information inside commit message.
> v1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/
> ---
>  kernel/bpf/syscall.c | 16 +++++-----------
>  1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c420edbfb7c8..5d0a4db6fb85 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1979,8 +1979,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
>         void __user *values = u64_to_user_ptr(attr->batch.values);
>         void __user *keys = u64_to_user_ptr(attr->batch.keys);
>         void *buf, *buf_prevkey, *prev_key, *key, *value;
> -       int err, retry = MAP_LOOKUP_RETRIES;
>         u32 value_size, cp, max_count;
> +       int err;
>
>         if (attr->batch.elem_flags & ~BPF_F_LOCK)
>                 return -EINVAL;
> @@ -2026,14 +2026,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
>                 err = bpf_map_copy_value(map, key, value,
>                                          attr->batch.elem_flags);
>
> -               if (err == -ENOENT) {
> -                       if (retry) {
> -                               retry--;
> -                               continue;
> -                       }
> -                       err = -EINTR;
> -                       break;
> -               }
> +               if (err == -ENOENT)
> +                       goto next_key;
>
>                 if (err)
>                         goto free_buf;
> @@ -2048,12 +2042,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
>                         goto free_buf;
>                 }
>
> +               cp++;
> +next_key:
>                 if (!prev_key)
>                         prev_key = buf_prevkey;
>
>                 swap(prev_key, key);
> -               retry = MAP_LOOKUP_RETRIES;
> -               cp++;

#define MAP_LOOKUP_RETRIES is unused after this patch.
Delete it as well.

Pls keep acks on respin.

pw-bot: cr





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux