Hi, I'm running a Linux VM with the recent mainline (48342fc07272, 12/20/2020) on Hyper-V. When I test hibernation, the VM can easily hit the below BUG_ON during the resume procedure (I estimate this can repro about 1/5 of the time). BTW, my VM has 40 vCPUs. I can't repro the BUG_ON with v5.9.0, so I suspect something in v5.10.0 may be broken? In v5.10.0, when the BUG_ON happens, rq->nr_running==2, and rq->nr_pinned==0: 7587 int sched_cpu_dying(unsigned int cpu) 7588 { 7589 struct rq *rq = cpu_rq(cpu); 7590 struct rq_flags rf; 7591 7592 /* Handle pending wakeups and then migrate everything off */ 7593 sched_tick_stop(cpu); 7594 7595 rq_lock_irqsave(rq, &rf); 7596 BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq)); 7597 rq_unlock_irqrestore(rq, &rf); 7598 7599 calc_load_migrate(rq); 7600 update_max_interval(); 7601 nohz_balance_exit_idle(rq); 7602 hrtick_clear(rq); 7603 return 0; 7604 } The last commit that touches the BUG_ON line is the commit 3015ef4b98f5 ("sched/core: Make migrate disable and CPU hotplug cooperative") but the commit looks good to me. Any idea? Thanks, -- Dexuan [ 5.383378] PM: Image signature found, resuming [ 5.388027] PM: hibernation: resume from hibernation [ 5.395794] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 5.397058] OOM killer disabled. [ 5.397059] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 5.413013] PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff] [ 5.419331] PM: hibernation: Marking nosave pages: [mem 0x0009f000-0x000fffff] [ 5.425502] PM: hibernation: Marking nosave pages: [mem 0xf7ff0000-0xffffffff] [ 5.431706] PM: hibernation: Basic memory bitmaps created [ 5.465205] PM: Using 3 thread(s) for decompression [ 5.469590] PM: Loading and decompressing image data (505553 pages)... [ 5.492790] PM: Image loading progress: 0% [ 6.532641] PM: Image loading progress: 10% [ 6.899683] PM: Image loading progress: 20% [ 7.251672] PM: Image loading progress: 30% [ 7.598808] PM: Image loading progress: 40% [ 8.056153] PM: Image loading progress: 50% [ 8.534077] PM: Image loading progress: 60% [ 9.029886] PM: Image loading progress: 70% [ 9.542015] PM: Image loading progress: 80% [ 10.025326] PM: Image loading progress: 90% [ 10.525804] PM: Image loading progress: 100% [ 10.530241] PM: Image loading done [ 10.533295] PM: hibernation: Read 2022212 kbytes in 5.05 seconds (400.43 MB/s) [ 10.540827] PM: Image successfully loaded [ 10.599243] serial 00:04: disabled [ 10.619935] vmbus 242ff919-07db-4180-9c2e-b86cb68c8c55: parent VMBUS:01 should not be sleeping [ 10.646542] Disabling non-boot CPUs ... [ 10.694380] smpboot: CPU 1 is now offline [ 10.729044] smpboot: CPU 2 is now offline [ 10.756819] smpboot: CPU 3 is now offline [ 10.776784] smpboot: CPU 4 is now offline [ 10.800866] smpboot: CPU 5 is now offline [ 10.828923] smpboot: CPU 6 is now offline [ 10.849013] smpboot: CPU 7 is now offline [ 10.876722] smpboot: CPU 8 is now offline [ 10.909426] smpboot: CPU 9 is now offline [ 10.929360] smpboot: CPU 10 is now offline [ 10.957059] smpboot: CPU 11 is now offline [ 10.976542] smpboot: CPU 12 is now offline [ 11.008470] smpboot: CPU 13 is now offline [ 11.036356] smpboot: CPU 14 is now offline [ 11.072396] smpboot: CPU 15 is now offline [ 11.100229] smpboot: CPU 16 is now offline [ 11.128638] smpboot: CPU 17 is now offline [ 11.148479] smpboot: CPU 18 is now offline [ 11.173537] smpboot: CPU 19 is now offline [ 11.205680] smpboot: CPU 20 is now offline [ 11.231799] smpboot: CPU 21 is now offline [ 11.259562] smpboot: CPU 22 is now offline [ 11.288672] smpboot: CPU 23 is now offline [ 11.319691] smpboot: CPU 24 is now offline [ 11.355523] smpboot: CPU 25 is now offline [ 11.399469] smpboot: CPU 26 is now offline [ 11.435438] smpboot: CPU 27 is now offline [ 11.471402] smpboot: CPU 28 is now offline [ 11.515488] smpboot: CPU 29 is now offline [ 11.551355] smpboot: CPU 30 is now offline [ 11.591326] smpboot: CPU 31 is now offline [ 11.624004] smpboot: CPU 32 is now offline [ 11.659326] smpboot: CPU 33 is now offline [ 11.687478] smpboot: CPU 34 is now offline [ 11.719243] smpboot: CPU 35 is now offline [ 11.747252] smpboot: CPU 36 is now offline [ 11.771224] smpboot: CPU 37 is now offline [ 11.795089] ------------[ cut here ]------------ [ 11.798052] kernel BUG at kernel/sched/core.c:7596! [ 11.798052] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 11.798052] CPU: 38 PID: 202 Comm: migration/38 Tainted: G E 5.10.0+ #6 [ 11.798052] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 11.798052] Stopper: multi_cpu_stop+0x0/0xf0 <- 0x0 [ 11.798052] RIP: 0010:sched_cpu_dying+0xa3/0xc0 [ 11.798052] Code: 73 85 05 00 84 c0 75 12 48 83 c4 08 31 c0 5b c3 f0 48 01 05 af f4 7e ... [ 11.798052] RSP: 0000:ffffbb3c820bfde0 EFLAGS: 00010002 [ 11.798052] RAX: 0000000000000082 RBX: ffff94f83acaac40 RCX: 0000000000000000 [ 11.798052] RDX: 0000000000000001 RSI: 000000000000005a RDI: 0000000000000001 [ 11.798052] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000 [ 11.798052] R10: 000000000000000b R11: ffffffff89cbed88 R12: ffffffff88aa7ed0 [ 11.798052] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003 [ 11.798052] FS: 0000000000000000(0000) GS:ffff94f83ac80000(0000) knlGS:0000000000000000 [ 11.798052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.798052] CR2: 0000000000000000 CR3: 00000001144fa002 CR4: 00000000003706e0 [ 11.798052] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 11.798052] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 11.798052] Call Trace: [ 11.798052] ? rcutree_dying_cpu+0x12/0x30 [ 11.798052] cpuhp_invoke_callback+0x82/0x4a0 [ 11.798052] take_cpu_down+0x67/0xa0 [ 11.798052] multi_cpu_stop+0x64/0xf0 [ 11.798052] ? stop_machine_yield+0x10/0x10 [ 11.798052] cpu_stopper_thread+0x95/0x130 [ 11.798052] ? sort_range+0x20/0x20 [ 11.798052] smpboot_thread_fn+0x198/0x230 [ 11.798052] kthread+0x13d/0x160 [ 11.798052] ? kthread_create_on_node+0x60/0x60 [ 11.798052] ret_from_fork+0x22/0x30 [ 11.798052] Modules linked in: hv_utils(E) cn(E) hid_generic(E) ... [ 11.798052] ---[ end trace 9f1a31ebcf9c45a1 ]--- [ 11.798052] RIP: 0010:sched_cpu_dying+0xa3/0xc0 [ 11.798052] Code: 73 85 05 00 84 c0 75 12 48 83 c4 08 31 c0 5b c3 f0 48 01 05 af f4 7e ... [ 11.798052] RSP: 0000:ffffbb3c820bfde0 EFLAGS: 00010002 [ 11.798052] RAX: 0000000000000082 RBX: ffff94f83acaac40 RCX: 0000000000000000 [ 11.798052] RDX: 0000000000000001 RSI: 000000000000005a RDI: 0000000000000001 [ 11.798052] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000 [ 11.798052] R10: 000000000000000b R11: ffffffff89cbed88 R12: ffffffff88aa7ed0 [ 11.798052] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000003 [ 11.798052] FS: 0000000000000000(0000) GS:ffff94f83ac80000(0000) knlGS:0000000000000000 [ 11.798052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 11.798052] CR2: 0000000000000000 CR3: 00000001144fa002 CR4: 00000000003706e0 [ 11.798052] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 11.798052] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 11.798052] note: migration/38[202] exited with preempt_count 2