Hi Steven, > /* Called from hardirq context */ > -static void try_to_push_tasks(void *arg) > +void rto_push_irq_work_func(struct irq_work *work) > { > - struct rt_rq *rt_rq = arg; > - struct rq *rq, *src_rq; > - int this_cpu; > + struct rq *rq; > int cpu; > > - this_cpu = rt_rq->push_cpu; > + rq = this_rq(); > > - /* Paranoid check */ > - BUG_ON(this_cpu != smp_processor_id()); > - > - rq = cpu_rq(this_cpu); > - src_rq = rq_of_rt_rq(rt_rq); > - > -again: > + /* > + * We do not need to grab the lock to check for has_pushable_tasks. > + * When it gets updated, a check is made if a push is possible. > + */ > if (has_pushable_tasks(rq)) { > raw_spin_lock(&rq->lock); > - push_rt_task(rq); > + push_rt_tasks(rq); > raw_spin_unlock(&rq->lock); > } > > - /* Pass the IPI to the next rt overloaded queue */ > - raw_spin_lock(&rt_rq->push_lock); > - /* > - * If the source queue changed since the IPI went out, > - * we need to restart the search from that CPU again. > - */ > - if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { > - rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; > - rt_rq->push_cpu = src_rq->cpu; > - } > + raw_spin_lock(&rq->rd->rto_lock); > > - cpu = find_next_push_cpu(src_rq); > + /* Pass the IPI to the next rt overloaded queue */ > + cpu = rto_next_cpu(rq); > > - if (cpu >= nr_cpu_ids) > - rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; > - raw_spin_unlock(&rt_rq->push_lock); > + raw_spin_unlock(&rq->rd->rto_lock); > > - if (cpu >= nr_cpu_ids) > + if (cpu < 0) > return; I am seeing "spinlock already unlocked" BUG for rd->rto_lock on a 4.9 stable kernel based system. This issue is observed only after inclusion of this patch. It appears to me that rq->rd can change between spinlock is acquired and released in rto_push_irq_work_func() IRQ work if hotplug is in progress. It was only reported couple of times during long stress testing. The issue can be easily reproduced if an artificial delay is introduced between lock and unlock of rto_lock. The rq->rd is changed under rq->lock, so we can protect this race with rq->lock. The below patch solved the problem. we are taking rq->lock in pull_rt_task()->tell_cpu_to_push(), so I extended the same here. Please let me know your thoughts on this. diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index d863d39..478192b 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2284,6 +2284,7 @@ void rto_push_irq_work_func(struct irq_work *work) raw_spin_unlock(&rq->lock); } + raw_spin_lock(&rq->lock); raw_spin_lock(&rq->rd->rto_lock); /* Pass the IPI to the next rt overloaded queue */ @@ -2291,11 +2292,10 @@ void rto_push_irq_work_func(struct irq_work *work) raw_spin_unlock(&rq->rd->rto_lock); - if (cpu < 0) - return; - /* Try the next RT overloaded CPU */ - irq_work_queue_on(&rq->rd->rto_push_work, cpu); + if (cpu >= 0) + irq_work_queue_on(&rq->rd->rto_push_work, cpu); + raw_spin_unlock(&rq->lock); } #endif /* HAVE_RT_PUSH_IPI */ Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |