> On Dec 12, 2022, at 6:45 AM, Zhang, Qiang1 <qiang1.zhang@xxxxxxxxx> wrote: > > On Fri, Dec 9, 2022 at 11:34 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: >> >> Hello, >> >> I see this an hour into the run for TREE03 on v4.19: >> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 32 --duration 60 >> >> Checking to see if Qiang has any thoughts as I saw him comment about a similar >> issue in [1]. >> >> [ 3243.844445] ------------[ cut here ]------------^M >> [ 3243.847112] WARNING: CPU: 1 PID: 0 at kernel/kthread.c:411 __kthread_bind_mask+0x19/0x60^M >> [ 3243.851585] Modules linked in:^M >> [ 3243.853295] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.267+ #1^M >> [ 3243.856699] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014^M >> [ 3243.860052] RIP: 0010:__kthread_bind_mask+0x19/0x60^M >> [ 3243.861769] Code: 48 89 42 20 c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 41 55 41 54 55 48 89 f5 48 89 d6 53 48 89 fb e8 3c e9 00 00 48 85 c0 75 09 <0f> 0b 5b 5d 41 5c 41 5d c3 4c 8d ab b4 07 00 00 4c 89 ef e8 6f 28^M >> [ 3243.867751] RSP: 0000:ffffa54b8014fec0 EFLAGS: 00010246^M >> [ 3243.868959] RAX: 0000000000000000 RBX: ffff95555ee3c240 RCX: 0000000000000000^M >> [ 3243.870606] RDX: ffff95555f063d80 RSI: 0000000000000246 RDI: 00000000ffffffff^M >> [ 3243.872246] RBP: ffffffffb720a010 R08: af82bdcb11f9f4ff R09: 0000000000000000^M >> [ 3243.873894] R10: ffffa54b8014fe70 R11: 00000000621ccdc2 R12: 0000000000000000^M >> [ 3243.875538] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000^M >> [ 3243.877181] FS: 0000000000000000(0000) GS:ffff95555f040000(0000) knlGS:0000000000000000^M >> [ 3243.878698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M >> [ 3243.879747] CR2: 0000000000000000 CR3: 000000001300a000 CR4: 00000000000006e0^M >> [ 3243.881050] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M >> [ 3243.882358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400^M >> [ 3243.883660] Call Trace:^M >> [ 3243.884925] kthread_unpark+0x52/0x60^M >> [ 3243.885613] cpuhp_online_idle+0x31/0x50^M >> [ 3243.886341] cpu_startup_entry+0x62/0x70^M >> [ 3243.887063] start_secondary+0x186/0x1b0^M >> [ 3243.887783] secondary_startup_64+0xa4/0xb0^M >> >> [1] https://groups.google.com/g/syzkaller-bugs/c/w_LARy6pxvQ/m/dKjQyHAxAQAJ >> >> >> This set of scheduler patches seems to make it go away, however I am >> running a long weekend test to collect more data: >> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=rcu/torture/v4.19.fixes.wip.120922 > > > Hi Joel > > I also used the link above to do the TREE03 test, looks like the problem still exists: > > 5834 [ 966.016205] kthread_unpark+0x50/0x60^M > 5835 [ 966.016440] cpuhp_online_idle+0x31/0x50^M > 5836 [ 966.016678] cpu_startup_entry+0x65/0x70^M > 5837 [ 966.016913] start_secondary+0x18b/0x1c0^M > 5838 [ 966.017148] secondary_startup_64+0xa4/0xb0^M > 5839 [ 966.017407] ---[ end trace 636953f76b8055db ]---^M > 5840 [ 966.017691] migration/1 R running task 14952 18 2 0x80000000 last_sleep: 961405359084. last_runna ble: 961700816631^M > 5841 [ 966.018406] Call Trace:^M > 5842 [ 966.018559] __schedule+0x75f/0x1320^M > 5843 [ 966.018775] ? cpuhp_invoke_callback+0x88/0x600^M > 5844 [ 966.019042] ? ___preempt_schedule+0x16/0x18^M > 5845 [ 966.019302] ? sort_range+0x20/0x20^M > 5846 [ 966.019515] preempt_schedule_common+0x32/0x80^M > 5847 [ 966.019776] ___preempt_schedule+0x16/0x18^M > 5848 [ 966.020018] _raw_spin_unlock_irq+0x1f/0x20^M > 5849 [ 966.020275] smpboot_thread_fn+0x195/0x230^M > 5850 [ 966.020525] kthread+0x139/0x160^M > 5851 [ 966.020717] ? kthread_create_worker_on_cpu+0x60/0x60^M > 5852 [ 966.021014] ret_from_fork+0x35/0x40^M > > From the calltrace, the state of migration kthreads is not correct, > I'm also looking at what's going wrong. > >Thanks for checking. I also noticed that the state of those threads is running instead of parked, causing the warning. > >- Joel Hi Joel I find the TREE03 enable CONFIG_SCHED_CORE = y, under this condition, if the cpu goes offline, we will directly select the idle kthread, which leads to it preempting the migration kthread. through calltrace, it can also be found , the migration/1 kthread is preempted and schedule out. [ 966.017691] migration/1 R running task 14952 [ 966.018559] __schedule+0x75f/0x1320^M [ 966.018775] ? cpuhp_invoke_callback+0x88/0x600^M [ 966.019042] ? ___preempt_schedule+0x16/0x18^M [ 966.019302] ? sort_range+0x20/0x20^M [ 966.019515] preempt_schedule_common+0x32/0x80^M [ 966.019776] ___preempt_schedule+0x16/0x18^M [ 966.020018] _raw_spin_unlock_irq+0x1f/0x20^M [ 966.020275] smpboot_thread_fn+0x195/0x230^M [ 966.020525] kthread+0x139/0x160^M [ 966.020717] ? kthread_create_worker_on_cpu+0x60/0x60^M [ 966.021014] ret_from_fork+0x35/0x40^M The following modification can fix this warning(this change comes from the new kernel version). diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0acbc7706d71..1c76226a2c26 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4109,8 +4109,10 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) bool fi_before = false; cpu = cpu_of(rq); - if (cpu_is_offline(cpu)) - return idle_sched_class.pick_next_task(rq, prev, rf); + if (cpu_is_offline(cpu)) { + rq->core_pick = NULL; + return __pick_next_task(rq, prev, rf); + } Thanks Zqiang > > > > Thanks > Zqiang > >> >> Thanks.