Re: bpf_timer memory utilization

Hou Tao <houtao1@xxxxxxxxxx> · Sat, 18 Mar 2023 09:41:18 +0800



On 3/18/2023 12:40 AM, Chris Lai wrote:
> Might be a bug using bpf_timer on Hashmap?
> With same setups using bpf_timer but with LRU_Hashmap, the memory
> usage is way better: see following
>
> with LRU_Hashmap
> 16M capacity, 1 minute bpf_timer callback/cleanup..  (pre-allocation
> ~5G),  memory usage peaked ~7G (Flat and does not fluctuate - unlike
> Hashmap)
> 32M capacity, 1 minute bpf_timer callback/cleanup..  (pre-allocation
> ~8G),  memory usage peaked ~12G (Flat and does not fluctuate - unlike
> Hashmap)
In your setup, LRU hash map is preallocated and normal hash map is not
preallocated (aka BPF_F_NO_PREALLOC), right ? If it is true, could you please
test the memory usage of preallocated hash map ? Also could you please  share
the version of used Linux kernel and the way on how to create hash map and
operate on hash map ?
>
>
>
> On Thu, Mar 16, 2023 at 6:22 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
>> On Thu, Mar 16, 2023 at 12:18 PM Chris Lai <chrlai@xxxxxxxxxxxxx> wrote:
>>> Hello,
>>> Using BPF Hashmap with bpf_timer for each entry value and callback to
>>> delete the entry after 1 minute.
>>> Constantly creating load to insert elements onto the map, we have
>>> observed the following:
>>> -3M map capacity, 1 minute bpf_timer callback/cleanup, memory usage
>>> peaked around 5GB
>>> -16M map capacity, 1 minute bpf_timer callback/cleanup, memory usage
>>> peaked around 34GB
>>> -24M map capacity, 1 minute bpf_timer callback/cleanup, memory usage
>>> peaked around 55GB
>>> Wondering if this is expected and what is causing the huge increase in
>>> memory as we increase the number of elements inserted onto the map.
>>> Thank you.
Do the addition and deletion of hash map entry happen on different CPU ? If it
is true and bpf memory allocator is used (kernel version >= 6.1), the memory
blow-up may be explainable. Because the new allocation can not reuse the memory
freed by entry deletion, so the memory usage will increase rapidly. I had tested
such case and also written one selftest for such case, but it seems it only can
be mitigated [1], because RCU tasks trace GP is slow. If your setup is sticking
to non-preallocated hash map, you could first try to add
"rcupdate.rcu_task_enqueue_lim=nr_cpus" in kernel bootcmd to mitigate the problem.

[1] https://lore.kernel.org/bpf/20221209010947.3130477-1-houtao@xxxxxxxxxxxxxxx/
>> That's not normal. Do you have a small reproducer?
> .