Hi, On 12/7/2022 9:52 AM, Yonghong Song wrote: > > > On 12/5/22 8:29 PM, Hou Tao wrote: >> From: Hou Tao <houtao1@xxxxxxxxxx> >> >> When there are batched freeing operations on a specific CPU, part of >> the freed elements ((high_watermark - lower_watermark) / 2 + 1) will >> be moved to waiting_for_gp list and the remaining part will be left in >> free_by_rcu list and waits for the expiration of RCU-tasks-trace grace >> period and the next invocation of free_bulk(). > > The change below LGTM. However, the above description seems not precise. > IIUC, free_by_rcu list => waiting_for_gp is controlled by whether > call_rcu_in_progress is true or not. If it is true, free_by_rcu list > will remain intact and not moving into waiting_for_gp list. > So it is not 'the remaining part will be left in free_by_rcu'. > It is all elements in free_by_rcu to waiting_for_gp or none. Thanks for the review and the suggestions. I tried to say that moving from free_by_rcu to waiting_for_gp is slow, and there can be many free elements being stacked on free_by_rcu list. So how about the following rephrasing or do you still prefer "It is all elements in free_by_rcu to waiting_for_gp or none." ? When there are batched freeing operations on a specific CPU, part of the freed elements ((high_watermark - lower_watermark) / 2 + 1) will be moved to waiting_for_gp list and the remaining part will be left in free_by_rcu list. These elements in free_by_rcu list will be moved into waiting_for_gp list after one RCU-tasks-trace grace period and another invocation of free_bulk(), so there may be many free elements being stacked on free_by_rcu_list. >> >> So instead of invoking __alloc_percpu_gfp() or kmalloc_node() to >> allocate a new object, in alloc_bulk() just check whether or not there >> is freed element in free_by_rcu and reuse it if available. >> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> > > LGTM except the above suggestions. > > Acked-by: Yonghong Song <yhs@xxxxxx> > >> --- >> kernel/bpf/memalloc.c | 21 ++++++++++++++++++--- >> 1 file changed, 18 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c >> index 8f0d65f2474a..7daf147bc8f6 100644 >> --- a/kernel/bpf/memalloc.c >> +++ b/kernel/bpf/memalloc.c >> @@ -171,9 +171,24 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, >> int node) >> memcg = get_memcg(c); >> old_memcg = set_active_memcg(memcg); >> for (i = 0; i < cnt; i++) { >> - obj = __alloc(c, node); >> - if (!obj) >> - break; >> + /* >> + * free_by_rcu is only manipulated by irq work refill_work(). >> + * IRQ works on the same CPU are called sequentially, so it is >> + * safe to use __llist_del_first() here. If alloc_bulk() is >> + * invoked by the initial prefill, there will be no running >> + * irq work, so __llist_del_first() is fine as well. >> + * >> + * In most cases, objects on free_by_rcu are from the same CPU. >> + * If some objects come from other CPUs, it doesn't incur any >> + * harm because NUMA_NO_NODE means the preference for current >> + * numa node and it is not a guarantee. >> + */ >> + obj = __llist_del_first(&c->free_by_rcu); >> + if (!obj) { >> + obj = __alloc(c, node); >> + if (!obj) >> + break; >> + } >> if (IS_ENABLED(CONFIG_PREEMPT_RT)) >> /* In RT irq_work runs in per-cpu kthread, so disable >> * interrupts to avoid preemption and interrupts and > > .