On Fri, Jan 19, 2018 at 10:03:53AM -0500, Steven Rostedt wrote: > On Fri, 19 Jan 2018 14:53:05 +0530 > Pavan Kondeti <pkondeti@xxxxxxxxxxxxxx> wrote: > > > I am seeing "spinlock already unlocked" BUG for rd->rto_lock on a 4.9 > > stable kernel based system. This issue is observed only after > > inclusion of this patch. It appears to me that rq->rd can change > > between spinlock is acquired and released in rto_push_irq_work_func() > > IRQ work if hotplug is in progress. It was only reported couple of > > times during long stress testing. The issue can be easily reproduced > > if an artificial delay is introduced between lock and unlock of > > rto_lock. The rq->rd is changed under rq->lock, so we can protect this > > race with rq->lock. The below patch solved the problem. we are taking > > rq->lock in pull_rt_task()->tell_cpu_to_push(), so I extended the same > > here. Please let me know your thoughts on this. > > As so rq->rd can change. Interesting. > > > > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > > index d863d39..478192b 100644 > > --- a/kernel/sched/rt.c > > +++ b/kernel/sched/rt.c > > @@ -2284,6 +2284,7 @@ void rto_push_irq_work_func(struct irq_work *work) > > raw_spin_unlock(&rq->lock); > > } > > > > + raw_spin_lock(&rq->lock); > > > What about just saving the rd then? > > struct root_domain *rd; > > rd = READ_ONCE(rq->rd); > > then use that. Then we don't need to worry about it changing. > I am thinking of another problem because of the race between rto_push_irq_work_func() and rq_attach_root() where rq->rd is modified. Lets say, we cache the rq->rd here and queued the IRQ work on a remote CPU. In the mean time, the rq_attach_root() might drop all the references to this cached (old) rd and wants to free it. The rq->rd is freed in RCU-sched callback. If that remote CPU is in RCU quiescent state, the rq->rd can get freed before the IRQ work is executed. This results in the corruption of the remote CPU's IRQ work list. Right? Taking rq->lock in rto_push_irq_work_func() also does not help here. Probably we have to wait for the IRQ work to finish before freeing the older root domain in RCU-sched callback. Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project. -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |