> From: Valentin Schneider <valentin.schneider@xxxxxxx> > Sent: Tuesday, December 22, 2020 5:40 AM > To: Dexuan Cui <decui@xxxxxxxxxxxxx> > Cc: mingo@xxxxxxxxxx; peterz@xxxxxxxxxxxxx; juri.lelli@xxxxxxxxxx; > vincent.guittot@xxxxxxxxxx; dietmar.eggemann@xxxxxxx; > rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx; mgorman@xxxxxxx; > bristot@xxxxxxxxxx; x86@xxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; Michael Kelley > <mikelley@xxxxxxxxxxxxx> > Subject: Re: v5.10: sched_cpu_dying() hits BUG_ON during hibernation: kernel > BUG at kernel/sched/core.c:7596! > > > Hi, > > On 22/12/20 09:13, Dexuan Cui wrote: > > Hi, > > I'm running a Linux VM with the recent mainline (48342fc07272, 12/20/2020) > on Hyper-V. > > When I test hibernation, the VM can easily hit the below BUG_ON during the > resume > > procedure (I estimate this can repro about 1/5 of the time). BTW, my VM has > 40 vCPUs. > > > > I can't repro the BUG_ON with v5.9.0, so I suspect something in v5.10.0 may > be broken? > > > > In v5.10.0, when the BUG_ON happens, rq->nr_running==2, and > rq->nr_pinned==0: > > > > 7587 int sched_cpu_dying(unsigned int cpu) > > 7588 { > > 7589 struct rq *rq = cpu_rq(cpu); > > 7590 struct rq_flags rf; > > 7591 > > 7592 /* Handle pending wakeups and then migrate everything off > */ > > 7593 sched_tick_stop(cpu); > > 7594 > > 7595 rq_lock_irqsave(rq, &rf); > > 7596 BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq)); > > 7597 rq_unlock_irqrestore(rq, &rf); > > 7598 > > 7599 calc_load_migrate(rq); > > 7600 update_max_interval(); > > 7601 nohz_balance_exit_idle(rq); > > 7602 hrtick_clear(rq); > > 7603 return 0; > > 7604 } > > > > The last commit that touches the BUG_ON line is the commit > > 3015ef4b98f5 ("sched/core: Make migrate disable and CPU hotplug > cooperative") > > but the commit looks good to me. > > > > Any idea? > > > > I'd wager this extra task is a kworker; could you give this series a try? > > > https ://lore.kernel.org/lkml/20201218170919.2950-1-jiangshanlai@xxxxxxxxx/ Thanks, Valentin! It looks like the patchset can fix the BUG_ON, though I see a warning, which I reported here: https://lkml.org/lkml/2020/12/22/648 Thanks, -- Dexuan