On 12/19/23 3:31 AM, Hou Tao wrote:
Hi,
On 12/18/2023 2:30 PM, Yonghong Song wrote:
Typically for percpu map element or data structure, once allocated,
most operations are lookup or in-place update. Deletion are really
rare. Currently, for percpu data strcture, 4 elements will be
refilled if the size is <= 256. Let us just do with one element
for percpu data. For example, for size 256 and 128 cpus, the
potential saving will be 3 * 256 * 128 * 128 = 12MB.
Acked-by: Hou Tao <houtao1@xxxxxxxxxx>
Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
---
kernel/bpf/memalloc.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 50ab2fecc005..f37998662146 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -485,11 +485,16 @@ static void init_refill_work(struct bpf_mem_cache *c)
static void prefill_mem_cache(struct bpf_mem_cache *c, int cpu)
{
- /* To avoid consuming memory assume that 1st run of bpf
- * prog won't be doing more than 4 map_update_elem from
- * irq disabled region
+ int cnt = 1;
+
+ /* To avoid consuming memory, for non-percpu allocation, assume that
+ * 1st run of bpf prog won't be doing more than 4 map_update_elem from
+ * irq disabled region if unit size is less than or equal to 256.
+ * For all other cases, let us just do one allocation.
*/
- alloc_bulk(c, c->unit_size <= 256 ? 4 : 1, cpu_to_node(cpu), false);
+ if (!c->percpu_size && c->unit_size <= 256)
+ cnt = 4;
+ alloc_bulk(c, cnt, cpu_to_node(cpu), false);
}
Another thought about this patch. When the prefilled element is
allocated by the invocation of bpf_percpu_obj_new(), the prefill will
trigger again and this time it will allocate c->batch elements. For
256-bytes unit_size, c->batch will be 64, so the maximal memory
consumption under 128-cpus host will be: 64 * 256 * 128 * 128 = 256MB
Actually, it should be 48 * 256 * 128 * 128 in the worst case, due to
c->batch = max((c->high_watermark - c->low_watermark) / 4 * 3, 1);
But in reality, for percpu allocation, we probably won't have allocation
for all 128 cpus.
But your point is taken, for percpu allocation, we will have much less
allocations, so we should not use current lower_watermark/upper_watermark used
for non-percpu allocations. So for percpu allocation, I suggest to do
lower_watermark = 1;
upper_watermark = 3;
c->batch = 1;
Thanks!
when there is one allocation of bpf_percpu_obj_new() on each CPU. And my
question is that should we adjust the low_watermark and high_watermark
accordingly for per-cpu allocation to reduce the memory consumption ?
static int check_obj_size(struct bpf_mem_cache *c, unsigned int idx)