Re: bpf_map_update_elem returns -ENOMEM

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Thu, 16 May 2024 19:29:13 +0800

Hi,

+cc bpf list

On 5/6/2024 11:19 PM, Chase Hiltz wrote:
> Hi,
>
> I'm writing regarding a rather bizarre scenario that I'm hoping
> someone could provide insight on. I have a map defined as follows:
> ```
> struct {
>     __uint(type, BPF_MAP_TYPE_LRU_HASH);
>     __uint(max_entries, 1000000);
>     __type(key, struct my_map_key);
>     __type(value, struct my_map_val);
>     __uint(map_flags, BPF_F_NO_COMMON_LRU);
>     __uint(pinning, LIBBPF_PIN_BY_NAME);
> } my_map SEC(".maps");
> ```
> I have several fentry/fexit programs that need to perform updates in
> this map. After a certain number of map entries has been reached,
> calls to bpf_map_update_elem start returning `-ENOMEM`. As one
> example, I'm observing a program deployment where we have 816032
> entries on a 64 CPU machine, and a certain portion of updates are
> failing. I'm puzzled as to why this is occurring given that:
> - The 1M entries should be preallocated upon map creation (since I'm
> not using `BPF_F_NO_PREALLOC`)
> - The host machine has over 120G of unused memory available at any given time
>
> I've previously reduced max_entries by 25% under the assumption that
> this would prevent the problem from occurring, but this only caused

For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed
evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625
entries. In order to reduce of possibility of ENOMEM error, the right
way is to increase the value of max_entries instead of decreasing it.
> map updates to start failing at a lower threshold. I believe that this
> is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
> reasoning being that when map updates fail, it occurs consistently for
> specific CPUs.

Does the specific CPU always fail afterwards, or does it fail
periodically ? Is the machine running the bpf program an arm64 host or
an x86-64 host (namely uname -a) ? I suspect that the problem may be due
to htab_lock_bucket() which may fail under arm64 host in v5.15.

Could you please check and account the ratio of times when
htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
means that there may be too many overwrites of entries between different
CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
> At this time, all machines experiencing the problem are running kernel
> version 5.15, however I'm not currently able to try out any newer
> kernels to confirm whether or not the same problem occurs there. Any
> ideas on what could be responsible for this would be greatly
> appreciated!
>
> Thanks,
> Chase Hiltz
>
> .