Re: bpf_timer memory utilization

Chris Lai <chrlai@xxxxxxxxxxxxx> · Mon, 20 Mar 2023 10:16:47 -0700

Hi,

In my setup, both (LRU and HASH) are preallocated.
Kernel verson: Linux version 5.17.12-300.fc36.x86_64
I am doing load test via load generator (Spirent) to an DUT appliance.

Code snippet

#define MAXIMUM_CONNECTIONS 3000000
#define CALL_BACK_TIME 60000000000

struct ip_flow_tuple {
...
};

struct ip_flow_entry {
...
struct bpf_timer timer;
};

// HASH
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, MAXIMUM_CONNECTIONS);
__type(key, struct ip_flow_tuple);
__type(value, struct ip_flow_entry);
} flow_table __attribute__((section(".maps"), used));

// LRU
struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__uint(max_entries, MAXIMUM_CONNECTIONS);
__type(key, struct ip_flow_tuple);
__type(value, struct ip_flow_entry);
} flow_table __attribute__((section(".maps"), used));

SEC("xdp")
int testMapTimer(struct xdp_md *ctx) {
...
struct ip_flow_tuple in_ip_flow_tuple = {
   ...
}

struct ip_flow_entry *in_ip_flow_entry =
bpf_map_lookup_elem(&flow_table, &in_ip_flow_tuple);
if (in_ip_flow_entry == NULL) {
    struct ip_flow_entry in_ip_flow_entry_new = {};
    bpf_map_update_elem(&flow_table, &in_ip_flow_tuple,
&in_ip_flow_entry_new, BPF_ANY);
    struct ip_flow_entry *flow_entry_value =
bpf_map_lookup_elem(&flow_table, &in_ip_flow_tuple);

    if (flow_entry_value) {
        bpf_timer_init(&flow_entry_value->timer, &flow_table, 0);
        bpf_timer_set_callback(&flow_entry_value->timer, myTimerCallback);
        bpf_timer_start(&flow_entry_value->timer, (__u64)CALL_BACK_TIME, 0);
    }

}
...

}

On Fri, Mar 17, 2023 at 6:41 PM Hou Tao <houtao1@xxxxxxxxxx> wrote:
>
>
>
> On 3/18/2023 12:40 AM, Chris Lai wrote:
> > Might be a bug using bpf_timer on Hashmap?
> > With same setups using bpf_timer but with LRU_Hashmap, the memory
> > usage is way better: see following
> >
> > with LRU_Hashmap
> > 16M capacity, 1 minute bpf_timer callback/cleanup..  (pre-allocation
> > ~5G),  memory usage peaked ~7G (Flat and does not fluctuate - unlike
> > Hashmap)
> > 32M capacity, 1 minute bpf_timer callback/cleanup..  (pre-allocation
> > ~8G),  memory usage peaked ~12G (Flat and does not fluctuate - unlike
> > Hashmap)
> In your setup, LRU hash map is preallocated and normal hash map is not
> preallocated (aka BPF_F_NO_PREALLOC), right ? If it is true, could you please
> test the memory usage of preallocated hash map ? Also could you please  share
> the version of used Linux kernel and the way on how to create hash map and
> operate on hash map ?
> >
> >
> >
> > On Thu, Mar 16, 2023 at 6:22 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> >> On Thu, Mar 16, 2023 at 12:18 PM Chris Lai <chrlai@xxxxxxxxxxxxx> wrote:
> >>> Hello,
> >>> Using BPF Hashmap with bpf_timer for each entry value and callback to
> >>> delete the entry after 1 minute.
> >>> Constantly creating load to insert elements onto the map, we have
> >>> observed the following:
> >>> -3M map capacity, 1 minute bpf_timer callback/cleanup, memory usage
> >>> peaked around 5GB
> >>> -16M map capacity, 1 minute bpf_timer callback/cleanup, memory usage
> >>> peaked around 34GB
> >>> -24M map capacity, 1 minute bpf_timer callback/cleanup, memory usage
> >>> peaked around 55GB
> >>> Wondering if this is expected and what is causing the huge increase in
> >>> memory as we increase the number of elements inserted onto the map.
> >>> Thank you.
> Do the addition and deletion of hash map entry happen on different CPU ? If it
> is true and bpf memory allocator is used (kernel version >= 6.1), the memory
> blow-up may be explainable. Because the new allocation can not reuse the memory
> freed by entry deletion, so the memory usage will increase rapidly. I had tested
> such case and also written one selftest for such case, but it seems it only can
> be mitigated [1], because RCU tasks trace GP is slow. If your setup is sticking
> to non-preallocated hash map, you could first try to add
> "rcupdate.rcu_task_enqueue_lim=nr_cpus" in kernel bootcmd to mitigate the problem.
>
> [1] https://lore.kernel.org/bpf/20221209010947.3130477-1-houtao@xxxxxxxxxxxxxxx/
> >> That's not normal. Do you have a small reproducer?
> > .
>