2016-08-11 2:53 GMT+08:00 Frederic Weisbecker <fweisbec@xxxxxxxxx>: > On Wed, Aug 10, 2016 at 09:23:11PM +0800, Wanpeng Li wrote: >> 2016-08-10 20:43 GMT+08:00 Frederic Weisbecker <fweisbec@xxxxxxxxx>: >> > On Thu, Aug 04, 2016 at 05:51:20PM +0800, Wanpeng Li wrote: >> >> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> >> >> >> >> The dl task will be replenished after dl task timer fire and start a new >> >> period. It will be enqueued and to re-evaluate its dependency on the tick >> >> in order to restart it. However, if cpu is hot-unplug, irq_work_queue will >> >> splash since the target cpu is offline. >> >> >> >> As a result: >> >> >> >> WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0 >> >> Call Trace: >> >> dump_stack+0x99/0xd0 >> >> __warn+0xd1/0xf0 >> >> warn_slowpath_null+0x1d/0x20 >> >> irq_work_queue_on+0xad/0xe0 >> >> tick_nohz_full_kick_cpu+0x44/0x50 >> >> tick_nohz_dep_set_cpu+0x74/0xb0 >> >> enqueue_task_dl+0x226/0x480 >> >> activate_task+0x5c/0xa0 >> >> dl_task_timer+0x19b/0x2c0 >> >> ? push_dl_task.part.31+0x190/0x190 >> >> >> >> This can be triggered by hot-unplug the full dynticks cpu which dl task >> >> is running on. >> >> >> >> Actually we don't need to restart the tick since the target cpu is offline >> >> and nothing need scheduler tick. This patch fix it by not intend to re-evaluate >> >> tick dependency if the cpu is offline. >> >> >> >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> >> Cc: Juri Lelli <juri.lelli@xxxxxxx> >> >> Cc: Luca Abeni <luca.abeni@xxxxxxxx> >> >> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> >> >> --- >> >> kernel/sched/core.c | 3 +++ >> >> 1 file changed, 3 insertions(+) >> >> >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> >> index 7f2cae4..43b494f 100644 >> >> --- a/kernel/sched/core.c >> >> +++ b/kernel/sched/core.c >> >> @@ -628,6 +628,9 @@ bool sched_can_stop_tick(struct rq *rq) >> >> { >> >> int fifo_nr_running; >> >> >> >> + if (unlikely(!rq->online)) >> >> + return true; >> >> + >> > >> > I see, the CPU is offline but the tasks haven't been migrated yet. >> > That said it seems that rollback is still possible at this stage. >> > >> > Somehow we may need to deal with it. >> >> Thanks for your review, Frederic. :) The rq lock is held to serialize >> concurrent cpu hot-plug and dl task enqueue path(sched_can_stop_tick() >> is called in this path), so I think there is no issue here. > > It's not about concurrency though. It's rather that if the CPU runs > tickless, does cpu_down() and fails, then if the dl task needs the tick and > we ignore the IPI due to cpu_is_offline(), we may be still running tickless > forever after cpu_down() failure exit. If the cpu is offilne when the dl task timer fires, dl task will be migrated to another suitable cpu, so there is no issue if cpu hot-unplug fail and online again. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html