On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote: > 2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@xxxxxxxxx>: > > On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote: > >> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> > >> The dl task will be replenished after dl task timer fire and start a new > >> period. It will be enqueued and to re-evaluate its dependency on the tick > >> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will > >> splash since the target cpu is offline. > >> > >> As a result: > >> > >> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0 > >> Call Trace: > >> dump_stack+0x99/0xd0 > >> __warn+0xd1/0xf0 > >> warn_slowpath_null+0x1d/0x20 > >> irq_work_queue_on+0xad/0xe0 > >> tick_nohz_full_kick_cpu+0x44/0x50 > >> tick_nohz_dep_set_cpu+0x74/0xb0 > >> enqueue_task_dl+0x226/0x480 > >> activate_task+0x5c/0xa0 > >> dl_task_timer+0x19b/0x2c0 > >> ? push_dl_task.part.31+0x190/0x190 > >> > >> This can be triggered by hot-unplug the full dynticks cpu which dl task > >> is running on. > >> > >> Actually we don't need to restart the tick since the target cpu is offline > >> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate > >> tick dependency if the cpu is offline. > >> > >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> > >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > >> Cc: Juri Lelli <juri.lelli@xxxxxxx> > >> Cc: Luca Abeni <luca.abeni@xxxxxxxx> > >> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> --- > >> kernel/sched/core.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >> index 7f2cae4..43b494f 100644 > >> --- a/kernel/sched/core.c > >> +++ b/kernel/sched/core.c > >> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq) > >> { > >> int fifo_nr_running; > >> > >> + if (unlikely(!rq->online)) > >> + return true; > >> + > > > > I see, the CPU is offline but the tasks haven't been migrated yet. > > That said it seems that rollback is still possible at this stage. > > > > Somehow we may need to deal with it. > > Thanks for your review, Frederic. :) The rq lock is held to serialize > concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick() > is called in this path), so I think there is no issue here. It's not about concurrency though. It's rather that if the CPU runs tickless, does cpu_down() and fails, then if the dl task needs the tick and we ignore the IPI due to cpu_is_offline(), we may be still running tickless forever after cpu_down() failure exit. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html