Re: handling EINTR from bpf_map_lookup_batch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2/5/2025 2:08 AM, Yan Zhai wrote:
> I am getting EINTR when trying to use bpf_map_lookup_batch on an
> array_of_maps. The error happens when there is a "hole" in the array.
> For example, say the outer map has max entries of 256, each inner map
> is used for a transport protocol, and I only populated key 6 and
> 17 for TCP and UDP. Then when I do batch lookup, I always get EINTR.
> This so far seems to only happen with array of maps. Does it make
> sense to allow skipping to the next key for this map type? Something
> like:
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c420edbfb7c8..83915a8059ef 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2027,6 +2027,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
>                                          attr->batch.elem_flags);
>
>                 if (err == -ENOENT) {
> +                       if (IS_FD_ARRAY(map)
> +                               goto next_key;

It seems only BPF_MAP_TYPE_ARRAY_OF_MAPS supports batched operation, so
map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS will be enough. It is also
better to reset err as 0, otherwise generic_map_lookup_batch may return
-ENOENT.
>                         if (retry) {
>                                 retry--;
>                                 continue;
> @@ -2048,6 +2050,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
>                         goto free_buf;
>                 }
>
> +next_key:
>                 if (!prev_key)
>                         prev_key = buf_prevkey;
>

Make sense.  Please add a selftest for it. Another way is to return id 0
for these non-existent values in the fd array, but it may break existed
prog. Just skipping the empty array slot is better.
> Also the context about my scenario if anyone is curious: I am trying
> to associate each map to a userspace service in a multi tenant
> environment. This is an addition to cgroup accounting, in case the
> creator cgroup goes away, e.g. systemd service restarts always
> recreate cgroups. And we also want to monitor the utilization level of
> non-prealloc maps of different tenants. When dealing with inner maps,
> it is not always trivial. To connect dots I choose to read these IDs
> periodically and link them to the tenant of the outer map, that's
> where this EINTR occurred.
>
> best
> Yan
>
> .





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux