Hello Song,
Thank you for the review!
On 25. 2. 13. 03:33, Song Liu wrote:
On Wed, Feb 12, 2025 at 12:49 AM Changwoo Min <changwoo@xxxxxxxxxx> wrote:
When there is no entry in the free list (c->free_llist), unit_alloc()
fails even when there is available memory in the system, causing allocation
failure in various BPF calls -- such as bpf_mem_alloc() and
bpf_cpumask_create().
Such allocation failure can happen, especially when a BPF program tries many
allocations -- more than a delta between high and low watermarks -- in an
IRQ-disabled context.
Can we add a selftests for this scenario?
It would be a bit tricky to create an IRQ-disabled case in a BPF
program. However, I think it will be possible to reproduce the
allocation failure issue when allocating sufficiently enough
small allocations.
To address the problem, when there is no free entry, refill one entry on the
free list (alloc_bulk) and then retry the allocation procedure on the free
list. Note that since some callers of unit_alloc() do not allow to block
(e.g., bpf_cpumask_create), allocate the additional free entry in an atomic
manner (atomic = true in alloc_bulk).
Signed-off-by: Changwoo Min <changwoo@xxxxxxxxxx>
---
kernel/bpf/memalloc.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 889374722d0a..22fe9cfb2b56 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -784,6 +784,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
struct llist_node *llnode = NULL;
unsigned long flags;
int cnt = 0;
+ bool retry = false;
"retry = false;" reads weird to me. Maybe rename it as "retried"?
"retried" reads betters. Will fix it.
/* Disable irqs to prevent the following race for majority of prog types:
* prog_A
@@ -795,6 +796,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
* Use per-cpu 'active' counter to order free_list access between
* unit_alloc/unit_free/bpf_mem_refill.
*/
+retry_alloc:
local_irq_save(flags);
if (local_inc_return(&c->active) == 1) {
llnode = __llist_del_first(&c->free_llist);
@@ -815,6 +817,13 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
*/
local_irq_restore(flags);
+ if (unlikely(!llnode && !retry)) {
+ int cpu = smp_processor_id();
+ alloc_bulk(c, 1, cpu_to_node(cpu), true);
cpu_to_node() is not necessary, we can just do
alloc_bulk(c, 1, NUMA_NO_NODE, true);
Sure, will change it as suggested.
Also, maybe we can let alloc_bulk return int (0 or -ENOMEM).
For -ENOMEM, there is no need to goto retry_alloc.
Does this make sense?
Yup, I will change the alloc_bulk() as it returns 0 or -ENOMEM.
Regards,
Changwoo Min