Re: bpf_map_update_elem returns -ENOMEM

Chase Hiltz <chase@xxxxxxxx> · Fri, 17 May 2024 09:52:39 -0400

Hi,

Thanks for the replies.

> Joe also gave a talk about LRU maps LPC a couple of years ago which
> might give some insight:
Thanks, this was very helpful in understanding how LRU eviction works!
I definitely think it's related to high levels of contention on
individual machines causing LRU eviction to fail, given that I'm only
seeing it occur for those which consistently process the most packets.

> There have been several updates to the LRU map code since 5.15 so it is
> definitely possible that it will behave differently on a 6.x kernel.
I've compared the implementation between 5.15 and 6.5 (what I would
consider as a potential upgrade) and observed no more than a few
refactoring changes, but of course it's possible that I missed
something.

> In order to reduce of possibility of ENOMEM error, the right
> way is to increase the value of max_entries instead of decreasing it.
Yes, I now see the error of my ways in thinking that reducing it would
help at all when it actually hurts. For the time being, I'm going to
do this as a temporary remediation.

> Does the specific CPU always fail afterwards, or does it fail
> periodically ? Is the machine running the bpf program an arm64 host or
> an x86-64 host (namely uname -a) ? I suspect that the problem may be due
> to htab_lock_bucket() which may fail under arm64 host in v5.15
It always fails afterwards, I'm doing RSS and we notice this problem
occurring back-to-back for specific source-destination pairs (because
they always land on the same queue). This is a 64-bit system:
```
$ uname -a
5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64
x86_64 x86_64 GNU/Linux
```

> Could you please check and account the ratio of times when
> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
> means that there may be too many overwrites of entries between different
> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
I'm not aware of any way to get that information, if you have any
pointers I'd be happy to check this.

On Thu, 16 May 2024 at 07:29, Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> +cc bpf list
>
> On 5/6/2024 11:19 PM, Chase Hiltz wrote:
> > Hi,
> >
> > I'm writing regarding a rather bizarre scenario that I'm hoping
> > someone could provide insight on. I have a map defined as follows:
> > ```
> > struct {
> >     __uint(type, BPF_MAP_TYPE_LRU_HASH);
> >     __uint(max_entries, 1000000);
> >     __type(key, struct my_map_key);
> >     __type(value, struct my_map_val);
> >     __uint(map_flags, BPF_F_NO_COMMON_LRU);
> >     __uint(pinning, LIBBPF_PIN_BY_NAME);
> > } my_map SEC(".maps");
> > ```
> > I have several fentry/fexit programs that need to perform updates in
> > this map. After a certain number of map entries has been reached,
> > calls to bpf_map_update_elem start returning `-ENOMEM`. As one
> > example, I'm observing a program deployment where we have 816032
> > entries on a 64 CPU machine, and a certain portion of updates are
> > failing. I'm puzzled as to why this is occurring given that:
> > - The 1M entries should be preallocated upon map creation (since I'm
> > not using `BPF_F_NO_PREALLOC`)
> > - The host machine has over 120G of unused memory available at any given time
> >
> > I've previously reduced max_entries by 25% under the assumption that
> > this would prevent the problem from occurring, but this only caused
>
> For LRU map with BPF_F_NO_PREALLOC, the number of entries is distributed
> evenly between all CPUs. For your case, each CPU will have 1M/64 = 15625
> entries. In order to reduce of possibility of ENOMEM error, the right
> way is to increase the value of max_entries instead of decreasing it.
> > map updates to start failing at a lower threshold. I believe that this
> > is a problem with maps using the `BPF_F_NO_COMMON_LRU` flag, my
> > reasoning being that when map updates fail, it occurs consistently for
> > specific CPUs.
>
> Does the specific CPU always fail afterwards, or does it fail
> periodically ? Is the machine running the bpf program an arm64 host or
> an x86-64 host (namely uname -a) ? I suspect that the problem may be due
> to htab_lock_bucket() which may fail under arm64 host in v5.15.
>
> Could you please check and account the ratio of times when
> htab_lru_map_delete_node() returns 0 ? If the ratio high, it probably
> means that there may be too many overwrites of entries between different
> CPUs (e.g., CPU 0 updates key=X, then CPU 1 updates the same key again).
> > At this time, all machines experiencing the problem are running kernel
> > version 5.15, however I'm not currently able to try out any newer
> > kernels to confirm whether or not the same problem occurs there. Any
> > ideas on what could be responsible for this would be greatly
> > appreciated!
> >
> > Thanks,
> > Chase Hiltz
> >
> > .
>

-- 

Chase Hiltz

XDP Developer, Path Network

A 6991 E Camelback Rd., Suite D-300, Scottsdale AZ, 85251

W www.path.net  M +1 819 816 4353