Re: [PATCH bpf-next 1/2] bpf: Reuse freed element in free_by_rcu during allocation

Yonghong Song <yhs@xxxxxxxx> · Tue, 6 Dec 2022 17:52:57 -0800

On 12/5/22 8:29 PM, Hou Tao wrote:
From: Hou Tao <houtao1@xxxxxxxxxx>

When there are batched freeing operations on a specific CPU, part of
the freed elements ((high_watermark - lower_watermark) / 2 + 1) will
be moved to waiting_for_gp list and the remaining part will be left in
free_by_rcu list and waits for the expiration of RCU-tasks-trace grace
period and the next invocation of free_bulk().

The change below LGTM. However, the above description seems not precise.
IIUC, free_by_rcu list => waiting_for_gp is controlled by whether
call_rcu_in_progress is true or not. If it is true, free_by_rcu list
will remain intact and not moving into waiting_for_gp list.
So it is not 'the remaining part will be left in free_by_rcu'.
It is all elements in free_by_rcu to waiting_for_gp or none.


So instead of invoking __alloc_percpu_gfp() or kmalloc_node() to
allocate a new object, in alloc_bulk() just check whether or not there
is freed element in free_by_rcu and reuse it if available.

Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>

LGTM except the above suggestions.

Acked-by: Yonghong Song <yhs@xxxxxx>

---
  kernel/bpf/memalloc.c | 21 ++++++++++++++++++---
  1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 8f0d65f2474a..7daf147bc8f6 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -171,9 +171,24 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
  	memcg = get_memcg(c);
  	old_memcg = set_active_memcg(memcg);
  	for (i = 0; i < cnt; i++) {
-		obj = __alloc(c, node);
-		if (!obj)
-			break;
+		/*
+		 * free_by_rcu is only manipulated by irq work refill_work().
+		 * IRQ works on the same CPU are called sequentially, so it is
+		 * safe to use __llist_del_first() here. If alloc_bulk() is
+		 * invoked by the initial prefill, there will be no running
+		 * irq work, so __llist_del_first() is fine as well.
+		 *
+		 * In most cases, objects on free_by_rcu are from the same CPU.
+		 * If some objects come from other CPUs, it doesn't incur any
+		 * harm because NUMA_NO_NODE means the preference for current
+		 * numa node and it is not a guarantee.
+		 */
+		obj = __llist_del_first(&c->free_by_rcu);
+		if (!obj) {
+			obj = __alloc(c, node);
+			if (!obj)
+				break;
+		}
  		if (IS_ENABLED(CONFIG_PREEMPT_RT))
  			/* In RT irq_work runs in per-cpu kthread, so disable
  			 * interrupts to avoid preemption and interrupts and