From: Tang Chen <tangchen@xxxxxxxxxxxxxx> If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Here is a call trace: [ 609.824017] Call Trace: [ 609.824017] <IRQ> [ 609.824017] [<ffffffff810b0721>] select_fallback_rq+0x71/0x190 [ 609.824017] [<ffffffff810b086e>] ? try_to_wake_up+0x2e/0x2f0 [ 609.824017] [<ffffffff810b0b0b>] try_to_wake_up+0x2cb/0x2f0 [ 609.824017] [<ffffffff8109da08>] ? __run_hrtimer+0x78/0x320 [ 609.824017] [<ffffffff810b0b85>] wake_up_process+0x15/0x20 [ 609.824017] [<ffffffff8109ce62>] hrtimer_wakeup+0x22/0x30 [ 609.824017] [<ffffffff8109da13>] __run_hrtimer+0x83/0x320 [ 609.824017] [<ffffffff8109ce40>] ? update_rmtp+0x80/0x80 [ 609.824017] [<ffffffff8109df56>] hrtimer_interrupt+0x106/0x280 [ 609.824017] [<ffffffff810a72c8>] ? sd_free_ctl_entry+0x68/0x70 [ 609.824017] [<ffffffff8167cf39>] smp_apic_timer_interrupt+0x69/0x99 [ 609.824017] [<ffffffff8167be2f>] apic_timer_interrupt+0x6f/0x80 There is a hrtimer process sleeping, whose cpu has already been offlined. When it is waken up, it tries to find another cpu to run, and get a -1 nid. As a result, cpumask_of_node(-1) returns NULL, and causes ernel panic. This patch fixes this problem by judging if the nid is -1. If nid is not -1, a cpu on the same node will be picked. Else, a online cpu on another node will be picked. Cc: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Jiang Liu <liuj97@xxxxxxxxx> Cc: Minchan Kim <minchan.kim@xxxxxxxxx> Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Mel Gorman <mel@xxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Signed-off-by: Tang Chen <tangchen@xxxxxxxxxxxxxx> Signed-off-by: Wen Congyang <wency@xxxxxxxxxxxxxx> --- kernel/sched/core.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..4e6404e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1106,18 +1106,28 @@ EXPORT_SYMBOL_GPL(kick_process); */ static int select_fallback_rq(int cpu, struct task_struct *p) { - const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu)); + int nid = cpu_to_node(cpu); + const struct cpumask *nodemask = NULL; enum { cpuset, possible, fail } state = cpuset; int dest_cpu; - /* Look for allowed, online CPU in same node. */ - for_each_cpu(dest_cpu, nodemask) { - if (!cpu_online(dest_cpu)) - continue; - if (!cpu_active(dest_cpu)) - continue; - if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) - return dest_cpu; + /* + * If the node that the cpu is on has been offlined, cpu_to_node() + * will return -1. There is no cpu on the node, and we should + * select the cpu on the other node. + */ + if (nid != -1) { + nodemask = cpumask_of_node(nid); + + /* Look for allowed, online CPU in same node. */ + for_each_cpu(dest_cpu, nodemask) { + if (!cpu_online(dest_cpu)) + continue; + if (!cpu_active(dest_cpu)) + continue; + if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p))) + return dest_cpu; + } } for (;;) { -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>