Commit b4c6f86ec2f64 ("irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT") treat all irq_work without IRQ_WORK_HARD_IRQ flags to the lazy_list in PREEMPT_RT. But this kind of irq_work still has some difference with IRQ work set IRQ_WORK_LAZY. The difference as fellowing: - With IRQ_WORK_LAZY: (type1) This kind of work will be executed after the next time tick by wakeup irq_workd, there will be more scheduling delays. Let's mark this as type1 - Without IRQ_WORK_LAZY and IRQ_WORK_HARD_IRQ: (type2) This kind of irq_work will have a faster response speed by wakeup irq_workd from IPI interrupt. Let's mark it as type2 I want to know if this difference is by design. If this is by design, we have a problem that the irq_work of type2 will not execute as quickly as expected, it may be delayed by the irq_work of type1. Please consider the following scenarios: If the CPU queued a type1 irq_work A, and then a type2 irq_work B. But we won't make B executed quickly, because we won't issue the IPI interrupt to wakeup irq_workd (the llist_add call will return false). This PATCH will issue the IPI_IRQ_WORK to make B execute quickly. One thing that needs to be optimized is that we now have lazy_list.node.llist and lazy_work_raised which need to be granted to be atomicity, disabled the local CPU IRQ to make this atomic. There should be a better way to make these two variants to be atomically and I can go in deep if this little problem is not by design, and need to be fixed. If these two types of irq_work should be the same with the priority. maybe we should change. if (!lazy_work || tick_nohz_tick_stopped()) { arch_irq_work_raise(); } to if (!(lazy_work || rt_lazy_work) || tick_nohz_tick_stopped()) { arch_irq_work_raise(); } I'm uploading this patch just to explain the problem, hopefully don't pay too much attention to the ugly changes below. Signed-off-by: Schspa Shi <schspa@xxxxxxxxx> --- kernel/irq_work.c | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 7afa40fe5cc43..d5d0b720fac15 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -25,6 +25,7 @@ static DEFINE_PER_CPU(struct llist_head, raised_list); static DEFINE_PER_CPU(struct llist_head, lazy_list); static DEFINE_PER_CPU(struct task_struct *, irq_workd); +static DEFINE_PER_CPU(bool, lazy_work_raised); static void wake_irq_workd(void) { @@ -81,6 +82,7 @@ static void __irq_work_queue_local(struct irq_work *work) bool rt_lazy_work = false; bool lazy_work = false; int work_flags; + unsigned long flags; work_flags = atomic_read(&work->node.a_flags); if (work_flags & IRQ_WORK_LAZY) @@ -94,12 +96,35 @@ static void __irq_work_queue_local(struct irq_work *work) else list = this_cpu_ptr(&raised_list); - if (!llist_add(&work->node.llist, list)) + local_irq_save(flags); + if (!llist_add(&work->node.llist, list)) { + bool irq_raised; + /* + * In PREEMPT_RT, if we add a lazy work added to the list + * before, the work maybe not raised. We need a extra check + * for PREEMPT_RT. + */ + irq_raised = !xchg(this_cpu_ptr(&lazy_work_raised), true); + local_irq_restore(flags); + if (unlikely(!irq_raised)) + arch_irq_work_raise(); + return; + } /* If the work is "lazy", handle it from next tick if any */ - if (!lazy_work || tick_nohz_tick_stopped()) + if (!lazy_work || tick_nohz_tick_stopped()) { + (void) xchg(this_cpu_ptr(&lazy_work_raised), true); + local_irq_restore(flags); arch_irq_work_raise(); + } else if (lazy_work || rt_lazy_work) { + /* + * The first added irq work not raise a irq work, we need to + * raise one for the next added irq work. + */ + (void) xchg(this_cpu_ptr(&lazy_work_raised), false); + local_irq_restore(flags); + } } /* Enqueue the irq work @work on the current CPU */ @@ -151,9 +176,18 @@ bool irq_work_queue_on(struct irq_work *work, int cpu) */ if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(atomic_read(&work->node.a_flags) & IRQ_WORK_HARD_IRQ)) { + unsigned long flags; + + local_irq_save(flags); + if (!llist_add(&work->node.llist, &per_cpu(lazy_list, cpu))) { + if (!xchg(this_cpu_ptr(&lazy_work_raised), true)) + arch_irq_work_raise(); - if (!llist_add(&work->node.llist, &per_cpu(lazy_list, cpu))) + local_irq_restore(flags); goto out; + } + (void) xchg(this_cpu_ptr(&lazy_work_raised), true); + local_irq_restore(flags); work = &per_cpu(irq_work_wakeup, cpu); if (!irq_work_claim(work)) -- 2.37.0