(cc'ing Lai, hi!) There have been some changes in how workqueue handles CPU hotplug recently. Maybe it's related? Lai, can you please take a look? Christian also added that the problem can be reproduced on 3.16 w/ lower frequency. Thanks. On Fri, Aug 29, 2014 at 12:50:35PM +0200, Christian Borntraeger wrote: > Tejun, > > with kvm/next (pretty close to 3.17-rc1) as KVM guest I get the following warning in one of my stress tests: > > [ 0.296047] ------------[ cut here ]------------ > [ 0.296050] WARNING: at kernel/workqueue.c:809 > [ 0.296051] Modules linked in: > [ 0.296054] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc1+ #172 > [ 0.296056] task: 0000000000934618 ti: 000000000091c000 task.ti: 000000000091c000 > [ 0.296062] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 > Krnl GPRS: 0000000000000001 00000000063bed00 0000000000996740 0000000000000000 > [ 0.296065] 000000000000000d 000000000000036a ffffffff00000000 0000000000000000 > [ 0.296067] 000000000620e400 0000000000001201 0000000000001201 04000000009f6700 > [ 0.296068] 000000000639b700 0000000000000000 00000000001453e8 00000000063fbc80 > [ 0.296078] Krnl Code: 0000000000145416: 95002000 cli 0(%r2),0 > 000000000014541a: a774fff4 brc 7,145402 > #000000000014541e: a7f40001 brc 15,145420 > [ 0.296083] TCP: cubic registered > [ 0.296084] > >0000000000145422: 92012000 mvi 0(%r2),1 > [ 0.296086] > 0000000000145426: a7f4ffee brc 15,145402 > 000000000014542a: 0707 bcr 0,%r7 > 000000000014542c: ebdff0800024 stmg %r13,%r15,128(%r15) > [ 0.296092] Initializing XFRM netlink socket > [ 0.296094] > 0000000000145432: a7f13fe0 tmll %r15,16352 > [ 0.296096] Call Trace: > [ 0.296099] ([<0000000000001201>] 0x1201) > [ 0.296102] [<00000000001545dc>] ttwu_do_activate.constprop.97+0x64/0x7c > [ 0.296103] [<00000000001550ce>] sched_ttwu_pending+0x7e/0xd4 > [ 0.296104] [<0000000000156bea>] scheduler_ipi+0x62/0x168 > [ 0.296107] [<0000000000113482>] smp_handle_ext_call+0xbe/0xdc > [ 0.296111] [<000000000010b3dc>] do_ext_interrupt+0xb4/0xd4 > [ 0.296113] [<00000000001833be>] handle_irq_event_percpu+0x76/0x204 > [ 0.296115] [<00000000001870e8>] handle_percpu_irq+0x6c/0x98 > [ 0.296116] [<0000000000182a02>] generic_handle_irq+0x46/0x68 > [ 0.296117] [<000000000010b78a>] do_IRQ+0x5e/0x84 > [ 0.296120] [<0000000000634840>] ext_skip+0x42/0x46 > [ 0.296121] [<0000000000633fce>] vtime_stop_cpu+0x4a/0x9c > [ 0.296122] ([<0000000000000000>] (null)) > [ 0.296124] [<0000000000103816>] arch_cpu_idle+0x92/0xa0 > [ 0.296126] [<000000000016bb9a>] cpu_startup_entry+0x15a/0x228 > [ 0.296127] [<00000000009a5ae4>] start_kernel+0x408/0x418 > [ 0.296128] [<0000000000100020>] _stext+0x20/0x80 > [ 0.296129] Last Breaking-Event-Address: > [ 0.296130] [<000000000014541e>] wq_worker_waking_up+0x5e/0x6c > [ 0.296133] ---[ end trace 68915e61d289d806 ]--- > [ 0.296137] reboot: Restarting system > [ 0.296141] ------------[ cut here ]------------ > > The test is basically to start 50 KVM guests that only have a kernel + busybox ramdisk with rcS calling reboot. > One or two guests of these 50 have thing warning pretty soon. With 3.16 as guest everything runs fine. > > Do you have any idea what might be wrong or do I need to bisect? > > Christian > -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html