forgot to include the call back snippet static int myTimerCallback(void *map, struct ip_flow_tuple *key, struct ip_flow_entry *val) { bpf_map_delete_elem(map, key); return 0; } On Mon, Mar 20, 2023 at 10:16 AM Chris Lai <chrlai@xxxxxxxxxxxxx> wrote: > > Hi, > > In my setup, both (LRU and HASH) are preallocated. > Kernel verson: Linux version 5.17.12-300.fc36.x86_64 > I am doing load test via load generator (Spirent) to an DUT appliance. > > Code snippet > > #define MAXIMUM_CONNECTIONS 3000000 > #define CALL_BACK_TIME 60000000000 > > struct ip_flow_tuple { > ... > }; > > struct ip_flow_entry { > ... > struct bpf_timer timer; > }; > > // HASH > struct { > __uint(type, BPF_MAP_TYPE_HASH); > __uint(max_entries, MAXIMUM_CONNECTIONS); > __type(key, struct ip_flow_tuple); > __type(value, struct ip_flow_entry); > } flow_table __attribute__((section(".maps"), used)); > > // LRU > struct { > __uint(type, BPF_MAP_TYPE_LRU_HASH); > __uint(max_entries, MAXIMUM_CONNECTIONS); > __type(key, struct ip_flow_tuple); > __type(value, struct ip_flow_entry); > } flow_table __attribute__((section(".maps"), used)); > > SEC("xdp") > int testMapTimer(struct xdp_md *ctx) { > ... > struct ip_flow_tuple in_ip_flow_tuple = { > ... > } > > struct ip_flow_entry *in_ip_flow_entry = > bpf_map_lookup_elem(&flow_table, &in_ip_flow_tuple); > if (in_ip_flow_entry == NULL) { > struct ip_flow_entry in_ip_flow_entry_new = {}; > bpf_map_update_elem(&flow_table, &in_ip_flow_tuple, > &in_ip_flow_entry_new, BPF_ANY); > struct ip_flow_entry *flow_entry_value = > bpf_map_lookup_elem(&flow_table, &in_ip_flow_tuple); > > if (flow_entry_value) { > bpf_timer_init(&flow_entry_value->timer, &flow_table, 0); > bpf_timer_set_callback(&flow_entry_value->timer, myTimerCallback); > bpf_timer_start(&flow_entry_value->timer, (__u64)CALL_BACK_TIME, 0); > } > > } > ... > > } > > On Fri, Mar 17, 2023 at 6:41 PM Hou Tao <houtao1@xxxxxxxxxx> wrote: > > > > > > > > On 3/18/2023 12:40 AM, Chris Lai wrote: > > > Might be a bug using bpf_timer on Hashmap? > > > With same setups using bpf_timer but with LRU_Hashmap, the memory > > > usage is way better: see following > > > > > > with LRU_Hashmap > > > 16M capacity, 1 minute bpf_timer callback/cleanup.. (pre-allocation > > > ~5G), memory usage peaked ~7G (Flat and does not fluctuate - unlike > > > Hashmap) > > > 32M capacity, 1 minute bpf_timer callback/cleanup.. (pre-allocation > > > ~8G), memory usage peaked ~12G (Flat and does not fluctuate - unlike > > > Hashmap) > > In your setup, LRU hash map is preallocated and normal hash map is not > > preallocated (aka BPF_F_NO_PREALLOC), right ? If it is true, could you please > > test the memory usage of preallocated hash map ? Also could you please share > > the version of used Linux kernel and the way on how to create hash map and > > operate on hash map ? > > > > > > > > > > > > On Thu, Mar 16, 2023 at 6:22 PM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > >> On Thu, Mar 16, 2023 at 12:18 PM Chris Lai <chrlai@xxxxxxxxxxxxx> wrote: > > >>> Hello, > > >>> Using BPF Hashmap with bpf_timer for each entry value and callback to > > >>> delete the entry after 1 minute. > > >>> Constantly creating load to insert elements onto the map, we have > > >>> observed the following: > > >>> -3M map capacity, 1 minute bpf_timer callback/cleanup, memory usage > > >>> peaked around 5GB > > >>> -16M map capacity, 1 minute bpf_timer callback/cleanup, memory usage > > >>> peaked around 34GB > > >>> -24M map capacity, 1 minute bpf_timer callback/cleanup, memory usage > > >>> peaked around 55GB > > >>> Wondering if this is expected and what is causing the huge increase in > > >>> memory as we increase the number of elements inserted onto the map. > > >>> Thank you. > > Do the addition and deletion of hash map entry happen on different CPU ? If it > > is true and bpf memory allocator is used (kernel version >= 6.1), the memory > > blow-up may be explainable. Because the new allocation can not reuse the memory > > freed by entry deletion, so the memory usage will increase rapidly. I had tested > > such case and also written one selftest for such case, but it seems it only can > > be mitigated [1], because RCU tasks trace GP is slow. If your setup is sticking > > to non-preallocated hash map, you could first try to add > > "rcupdate.rcu_task_enqueue_lim=nr_cpus" in kernel bootcmd to mitigate the problem. > > > > [1] https://lore.kernel.org/bpf/20221209010947.3130477-1-houtao@xxxxxxxxxxxxxxx/ > > >> That's not normal. Do you have a small reproducer? > > > . > >