On Wed, Feb 12, 2025 at 12:49 AM Changwoo Min <changwoo@xxxxxxxxxx> wrote: > > When there is no entry in the free list (c->free_llist), unit_alloc() > fails even when there is available memory in the system, causing allocation > failure in various BPF calls -- such as bpf_mem_alloc() and > bpf_cpumask_create(). > > Such allocation failure can happen, especially when a BPF program tries many > allocations -- more than a delta between high and low watermarks -- in an > IRQ-disabled context. Can we add a selftests for this scenario? > > To address the problem, when there is no free entry, refill one entry on the > free list (alloc_bulk) and then retry the allocation procedure on the free > list. Note that since some callers of unit_alloc() do not allow to block > (e.g., bpf_cpumask_create), allocate the additional free entry in an atomic > manner (atomic = true in alloc_bulk). > > Signed-off-by: Changwoo Min <changwoo@xxxxxxxxxx> > --- > kernel/bpf/memalloc.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > index 889374722d0a..22fe9cfb2b56 100644 > --- a/kernel/bpf/memalloc.c > +++ b/kernel/bpf/memalloc.c > @@ -784,6 +784,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) > struct llist_node *llnode = NULL; > unsigned long flags; > int cnt = 0; > + bool retry = false; "retry = false;" reads weird to me. Maybe rename it as "retried"? > > /* Disable irqs to prevent the following race for majority of prog types: > * prog_A > @@ -795,6 +796,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) > * Use per-cpu 'active' counter to order free_list access between > * unit_alloc/unit_free/bpf_mem_refill. > */ > +retry_alloc: > local_irq_save(flags); > if (local_inc_return(&c->active) == 1) { > llnode = __llist_del_first(&c->free_llist); > @@ -815,6 +817,13 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c) > */ > local_irq_restore(flags); > > + if (unlikely(!llnode && !retry)) { > + int cpu = smp_processor_id(); > + alloc_bulk(c, 1, cpu_to_node(cpu), true); cpu_to_node() is not necessary, we can just do alloc_bulk(c, 1, NUMA_NO_NODE, true); Also, maybe we can let alloc_bulk return int (0 or -ENOMEM). For -ENOMEM, there is no need to goto retry_alloc. Does this make sense? Thanks, Song > + retry = true; > + goto retry_alloc; > + } > + > return llnode; > } > > -- > 2.48.1 >