Re: bpf_map_update_elem returns -ENOMEM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Chase Hiltz <chase@xxxxxxxx> writes:

> Hi,
>
> I'm writing regarding a rather bizarre scenario that I'm hoping
> someone could provide insight on. I have a map defined as follows:
> ```
> struct {
>     __uint(type, BPF_MAP_TYPE_LRU_HASH);
>     __uint(max_entries, 1000000);
>     __type(key, struct my_map_key);
>     __type(value, struct my_map_val);
>     __uint(map_flags, BPF_F_NO_COMMON_LRU);
>     __uint(pinning, LIBBPF_PIN_BY_NAME);
> } my_map SEC(".maps");
> ```
> I have several fentry/fexit programs that need to perform updates in
> this map. After a certain number of map entries has been reached,
> calls to bpf_map_update_elem start returning `-ENOMEM`. As one
> example, I'm observing a program deployment where we have 816032
> entries on a 64 CPU machine, and a certain portion of updates are
> failing. I'm puzzled as to why this is occurring given that:
> - The 1M entries should be preallocated upon map creation (since I'm
> not using `BPF_F_NO_PREALLOC`)
> - The host machine has over 120G of unused memory available at any
> given time

I hoped that I might be able to help here, given that I wrote the
documentation for BPF_MAP_TYPE_LRU_HASH. Unfortunately the details of
LRU eviction are complex, especially when using BPF_F_NO_COMMON_LRU for
per-cpu LRU lists.

The LRU documentation was updated by Joe Stringer, including a flowchart
which you might find helpful:

https://docs.kernel.org/bpf/map_hash.html

Joe also gave a talk about LRU maps LPC a couple of years ago which
might give some insight:

https://lpc.events/event/16/contributions/1368/

> I've previously reduced max_entries by 25% under the assumption that
> this would prevent the problem from occurring, but this only caused
> map updates to start failing at a lower threshold. I believe that this
> is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
> reasoning being that when map updates fail, it occurs consistently for
> specific CPUs.
> At this time, all machines experiencing the problem are running kernel
> version 5.15, however I'm not currently able to try out any newer
> kernels to confirm whether or not the same problem occurs there. Any
> ideas on what could be responsible for this would be greatly
> appreciated!

There have been several updates to the LRU map code since 5.15 so it is
definitely possible that it will behave differently on a 6.x kernel.

> Thanks,
> Chase Hiltz




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux