[PATCH bpf-next 1/3] bpf: Enable preemption after irq_work_raise() in unit_alloc()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Hou Tao <houtao1@xxxxxxxxxx>

When doing stress test for qp-trie, bpf_mem_alloc() returned NULL
unexpectedly because all qp-trie operations were initiated from
bpf syscalls and there was still available free memory. bpf_obj_new()
has the same problem as shown by the following selftest.

The failure is due to the preemption. irq_work_raise() will invoke
irq_work_claim() first to mark the irq work as pending and then inovke
__irq_work_queue_local() to raise an IPI. So when the current task
which is invoking irq_work_raise() is preempted by other task,
unit_alloc() may return NULL for preemptive task as shown below:

task A         task B

unit_alloc()
  // low_watermark = 32
  // free_cnt = 31 after alloc
  irq_work_raise()
    // mark irq work as IRQ_WORK_PENDING
    irq_work_claim()

	       // task B preempts task A
	       unit_alloc()
	         // free_cnt = 30 after alloc
	         // irq work is already PENDING,
	         // so just return
	         irq_work_raise()
	       // does unit_alloc() 30-times
	       ......
	       unit_alloc()
	         // free_cnt = 0 before alloc
	         return NULL

Fix it by invoking preempt_disable_notrace() before allocation and
invoking preempt_enable_notrace() to enable preemption after
irq_work_raise() completes. An alternative fix is to move
local_irq_restore() after the invocation of irq_work_raise(), but it
will enlarge the irq-disabled region. Another feasible fix is to only
disable preemption before invoking irq_work_queue() and enable
preemption after the invocation in irq_work_raise(), but it can't
handle the case when c->low_watermark is 1.

Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
---
 kernel/bpf/memalloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 9c49ae53deaf..83f8913ebb0a 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -6,6 +6,7 @@
 #include <linux/irq_work.h>
 #include <linux/bpf_mem_alloc.h>
 #include <linux/memcontrol.h>
+#include <linux/preempt.h>
 #include <asm/local.h>
 
 /* Any context (including NMI) BPF specific memory allocator.
@@ -725,6 +726,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
 	 * Use per-cpu 'active' counter to order free_list access between
 	 * unit_alloc/unit_free/bpf_mem_refill.
 	 */
+	preempt_disable_notrace();
 	local_irq_save(flags);
 	if (local_inc_return(&c->active) == 1) {
 		llnode = __llist_del_first(&c->free_llist);
@@ -740,6 +742,12 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
 
 	if (cnt < c->low_watermark)
 		irq_work_raise(c);
+	/* Enable preemption after the enqueue of irq work completes,
+	 * so free_llist may be refilled by irq work before other task
+	 * preempts current task.
+	 */
+	preempt_enable_notrace();
+
 	return llnode;
 }
 
-- 
2.29.2





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux