On 1/11/19 3:01 PM, Juergen Gross wrote: > On 11/01/2019 14:12, Hans van Kranenburg wrote: >> Hi, >> >> On 1/11/19 1:08 PM, Juergen Gross wrote: >>> Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable' >>> sched_clock() interface") broke Xen guest time handling across >>> migration: >>> >>> [ 187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done. >>> [ 187.251137] OOM killer disabled. >>> [ 187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. >>> [ 187.252299] suspending xenstore... >>> [ 187.266987] xen:grant_table: Grant tables using version 1 layout >>> [18446743811.706476] OOM killer enabled. >>> [18446743811.706478] Restarting tasks ... done. >>> [18446743811.720505] Setting capacity to 16777216 >>> >>> Fix that by setting xen_sched_clock_offset at resume time to ensure a >>> monotonic clock value. >>> >>> [...] >> >> I'm throwing around a PV domU over a bunch of test servers with live >> migrate now, and in between the kernel logging, I'm seeing this: >> >> [Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed >> 0.002 seconds) done. >> [Fri Jan 11 13:58:42 2019] OOM killer disabled. >> [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ... >> (elapsed 0.000 seconds) done. >> [Fri Jan 11 13:58:42 2019] suspending xenstore... >> [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------ >> [Fri Jan 11 13:58:42 2019] Current state: 1 >> [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at >> kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0 >> [Fri Jan 11 13:58:42 2019] Modules linked in: >> [Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted >> 4.19.14+ #1 >> [Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0 >> [Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80 >> 3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01 >> e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0 >> 74 69 48 89 >> [Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082 >> [Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80 >> RCX: 0000000000000006 >> [Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001 >> RDI: ffff88805df963f0 >> [Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000 >> R09: 0000000000000119 >> [Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d >> R12: ffff88805df9ca40 >> [Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000 >> R15: 0000000000000000 >> [Fri Jan 11 13:58:42 2019] FS: 00007f34193ce040(0000) >> GS:ffff88805df80000(0000) knlGS:0000000000000000 >> [Fri Jan 11 13:58:42 2019] CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 >> [Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000 >> CR4: 0000000000002660 >> [Fri Jan 11 13:58:42 2019] Call Trace: >> [Fri Jan 11 13:58:42 2019] tick_program_event+0x4b/0x70 >> [Fri Jan 11 13:58:42 2019] hrtimer_try_to_cancel+0xa8/0x100 >> [Fri Jan 11 13:58:42 2019] hrtimer_cancel+0x10/0x20 >> [Fri Jan 11 13:58:42 2019] __tick_nohz_idle_restart_tick+0x45/0xd0 >> [Fri Jan 11 13:58:42 2019] tick_nohz_idle_exit+0x93/0xa0 >> [Fri Jan 11 13:58:42 2019] do_idle+0x149/0x260 >> [Fri Jan 11 13:58:42 2019] cpu_startup_entry+0x6a/0x70 >> [Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]--- >> [Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1 >> layout >> [Fri Jan 11 13:58:42 2019] OOM killer enabled. >> [Fri Jan 11 13:58:42 2019] Restarting tasks ... done. >> [Fri Jan 11 13:58:42 2019] Setting capacity to 6291456 >> [Fri Jan 11 13:58:42 2019] Setting capacity to 10485760 >> >> This always happens on every *first* live migrate that I do after >> starting the domU. > > Yeah, its a WARN_ONCE(). > > And you didn't see it with v1 of the patch? No. > On the first glance this might be another bug just being exposed by > my patch. > > I'm investigating further, but this might take some time. Could you > meanwhile verify the same happens with kernel 5.0-rc1? That was the > one I tested with and I didn't spot that WARN. I have Linux 5.0-rc1 with v2 on top now, which gives me this on live migrate: [ 51.845967] xen:grant_table: Grant tables using version 1 layout [ 51.871076] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 51.871091] #PF error: [normal kernel read fault] [ 51.871100] PGD 0 P4D 0 [ 51.871109] Oops: 0000 [#1] SMP NOPTI [ 51.871117] CPU: 0 PID: 36 Comm: xenwatch Not tainted 5.0.0-rc1 #1 [ 51.871132] RIP: e030:blk_mq_map_swqueue+0x103/0x270 [ 51.871141] Code: 41 39 45 30 76 97 8b 0a 85 c9 74 ed 89 c1 48 c1 e1 04 49 03 8c 24 c0 05 00 00 48 8b 09 42 8b 3c 39 49 8b 4c 24 58 48 8b 0c f9 <4c> 0f a3 b1 98 00 00 00 72 c5 f0 4c 0f ab b1 98 00 00 00 44 0f b7 [ 51.871161] RSP: e02b:ffffc900008afca8 EFLAGS: 00010282 [ 51.871173] RAX: 0000000000000000 RBX: ffffffff82541728 RCX: 0000000000000000 [ 51.871184] RDX: ffff88805d0fae70 RSI: ffff88805deaa940 RDI: 0000000000000001 [ 51.871196] RBP: ffff88805be8b720 R08: 0000000000000001 R09: ffffea0001699900 [ 51.871206] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88805be8b218 [ 51.871217] R13: ffff88805d0fae68 R14: 0000000000000001 R15: 0000000000000004 [ 51.871237] FS: 00007faa50fac040(0000) GS:ffff88805de00000(0000) knlGS:0000000000000000 [ 51.871252] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 51.871261] CR2: 0000000000000098 CR3: 000000005c6e6000 CR4: 0000000000002660 [ 51.871275] Call Trace: [ 51.871285] blk_mq_update_nr_hw_queues+0x2fd/0x380 [ 51.871297] blkfront_resume+0x200/0x3f0 [ 51.871307] xenbus_dev_resume+0x48/0xf0 [ 51.871317] ? xenbus_dev_probe+0x120/0x120 [ 51.871326] dpm_run_callback+0x3c/0x160 [ 51.871336] device_resume+0xce/0x1d0 [ 51.871344] dpm_resume+0x115/0x2f0 [ 51.871352] ? find_watch+0x40/0x40 [ 51.871360] dpm_resume_end+0x8/0x10 [ 51.871370] do_suspend+0xef/0x1b0 [ 51.871378] shutdown_handler+0x123/0x150 [ 51.871387] xenwatch_thread+0xbb/0x160 [ 51.871397] ? wait_woken+0x80/0x80 [ 51.871406] kthread+0xf3/0x130 [ 51.871416] ? kthread_create_worker_on_cpu+0x70/0x70 [ 51.871427] ret_from_fork+0x35/0x40 [ 51.871435] Modules linked in: [ 51.871443] CR2: 0000000000000098 [ 51.871452] ---[ end trace 84a3a6932d70aa71 ]--- [ 51.871461] RIP: e030:blk_mq_map_swqueue+0x103/0x270 [ 51.871471] Code: 41 39 45 30 76 97 8b 0a 85 c9 74 ed 89 c1 48 c1 e1 04 49 03 8c 24 c0 05 00 00 48 8b 09 42 8b 3c 39 49 8b 4c 24 58 48 8b 0c f9 <4c> 0f a3 b1 98 00 00 00 72 c5 f0 4c 0f ab b1 98 00 00 00 44 0f b7 [ 51.871491] RSP: e02b:ffffc900008afca8 EFLAGS: 00010282 [ 51.871501] RAX: 0000000000000000 RBX: ffffffff82541728 RCX: 0000000000000000 [ 51.871512] RDX: ffff88805d0fae70 RSI: ffff88805deaa940 RDI: 0000000000000001 [ 51.871523] RBP: ffff88805be8b720 R08: 0000000000000001 R09: ffffea0001699900 [ 51.871533] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88805be8b218 [ 51.871545] R13: ffff88805d0fae68 R14: 0000000000000001 R15: 0000000000000004 [ 51.871562] FS: 00007faa50fac040(0000) GS:ffff88805de00000(0000) knlGS:0000000000000000 [ 51.871573] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 51.871582] CR2: 0000000000000098 CR3: 000000005c6e6000 CR4: 0000000000002660 When starting it on another test dom0 to see if the direction of movement matters, it mostly fails to boot with: [Fri Jan 11 16:16:34 2019] BUG: unable to handle kernel paging request at ffff88805c61e9f0 [Fri Jan 11 16:16:34 2019] #PF error: [PROT] [WRITE] [Fri Jan 11 16:16:34 2019] PGD 2410067 P4D 2410067 PUD 2c00067 PMD 5ff26067 PTE 801000005c61e065 [Fri Jan 11 16:16:34 2019] Oops: 0003 [#1] SMP NOPTI [Fri Jan 11 16:16:34 2019] CPU: 3 PID: 1943 Comm: apt-get Not tainted 5.0.0-rc1 #1 [Fri Jan 11 16:16:34 2019] RIP: e030:move_page_tables+0x669/0x970 [Fri Jan 11 16:16:34 2019] Code: 8a 00 48 8b 03 31 ff 48 89 44 24 18 e8 c6 ab e7 ff 66 90 48 89 c6 48 89 df e8 c3 cc e7 ff 66 90 48 8b 44 24 18 b9 0c 00 00 00 <48> 89 45 00 48 8b 44 24 08 f6 40 52 40 0f 85 69 02 00 00 48 8b 44 [Fri Jan 11 16:16:34 2019] RSP: e02b:ffffc900008c7d70 EFLAGS: 00010282 [Fri Jan 11 16:16:34 2019] RAX: 0000000cb064b067 RBX: ffff88805c61ea58 RCX: 000000000000000c [Fri Jan 11 16:16:34 2019] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000201 [Fri Jan 11 16:16:34 2019] RBP: ffff88805c61e9f0 R08: 0000000000000000 R09: 00000000000260a0 [Fri Jan 11 16:16:34 2019] R10: 0000000000007ff0 R11: ffff88805fd23000 R12: ffffea00017187a8 [Fri Jan 11 16:16:34 2019] R13: ffffea00017187a8 R14: 00007f04e9800000 R15: 00007f04e9600000 [Fri Jan 11 16:16:34 2019] FS: 00007f04ef355100(0000) GS:ffff88805df80000(0000) knlGS:0000000000000000 [Fri Jan 11 16:16:34 2019] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 CR3: 000000005c5fc000 CR4: 0000000000002660 [Fri Jan 11 16:16:34 2019] Call Trace: [Fri Jan 11 16:16:34 2019] move_vma.isra.34+0xd1/0x2d0 [Fri Jan 11 16:16:34 2019] __x64_sys_mremap+0x1b3/0x370 [Fri Jan 11 16:16:34 2019] do_syscall_64+0x49/0x100 [Fri Jan 11 16:16:34 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Fri Jan 11 16:16:34 2019] RIP: 0033:0x7f04ee2e227a [Fri Jan 11 16:16:34 2019] Code: 73 01 c3 48 8b 0d 1e fc 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee fb 2a 00 f7 d8 64 89 01 48 [Fri Jan 11 16:16:34 2019] RSP: 002b:00007fffb3da3e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000019 [Fri Jan 11 16:16:34 2019] RAX: ffffffffffffffda RBX: 000056533fa1bf50 RCX: 00007f04ee2e227a [Fri Jan 11 16:16:34 2019] RDX: 0000000001a00000 RSI: 0000000001900000 RDI: 00007f04e95ac000 [Fri Jan 11 16:16:34 2019] RBP: 0000000001a00000 R08: 2e8ba2e8ba2e8ba3 R09: 0000000000000040 [Fri Jan 11 16:16:34 2019] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f04e95ac060 [Fri Jan 11 16:16:34 2019] R13: 00007f04e95ac000 R14: 000056533fa45d73 R15: 000056534024bd10 [Fri Jan 11 16:16:34 2019] Modules linked in: [Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 [Fri Jan 11 16:16:34 2019] ---[ end trace 443702bd9ba5d6b2 ]--- [Fri Jan 11 16:16:34 2019] RIP: e030:move_page_tables+0x669/0x970 [Fri Jan 11 16:16:34 2019] Code: 8a 00 48 8b 03 31 ff 48 89 44 24 18 e8 c6 ab e7 ff 66 90 48 89 c6 48 89 df e8 c3 cc e7 ff 66 90 48 8b 44 24 18 b9 0c 00 00 00 <48> 89 45 00 48 8b 44 24 08 f6 40 52 40 0f 85 69 02 00 00 48 8b 44 [Fri Jan 11 16:16:34 2019] RSP: e02b:ffffc900008c7d70 EFLAGS: 00010282 [Fri Jan 11 16:16:34 2019] RAX: 0000000cb064b067 RBX: ffff88805c61ea58 RCX: 000000000000000c [Fri Jan 11 16:16:34 2019] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000201 [Fri Jan 11 16:16:34 2019] RBP: ffff88805c61e9f0 R08: 0000000000000000 R09: 00000000000260a0 [Fri Jan 11 16:16:34 2019] R10: 0000000000007ff0 R11: ffff88805fd23000 R12: ffffea00017187a8 [Fri Jan 11 16:16:34 2019] R13: ffffea00017187a8 R14: 00007f04e9800000 R15: 00007f04e9600000 [Fri Jan 11 16:16:34 2019] FS: 00007f04ef355100(0000) GS:ffff88805df80000(0000) knlGS:0000000000000000 [Fri Jan 11 16:16:34 2019] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Jan 11 16:16:34 2019] CR2: ffff88805c61e9f0 CR3: 000000005c5fc000 CR4: 0000000000002660 I can log in over ssh, but a command like ps afxu hangs. Oh, it seems that 5.0-rc1 is doing this all the time. Next time it's after 500 seconds uptime. xl destroy and trying again, it boots. 1st live migrate successful (and no clockevents_switch_state complaints), second one explodes with blk_mq_update_nr_hw_queues again. Hmok, as long as I live migrate the 5.0-rc1 domU around between dom0s with Xen 4.11.1-pre from commit 5acdd26fdc (the one we had in debian until yesterday) and Linux 4.19.9 in the dom0, it works. As soon as I live migrate to the one box running the new Xen 4.11.1 package from Debian unstable, and Linux 4.19.12, then I get the blk_mq_update_nr_hw_queues. If I do the same with 4.19 in the domU, I don't get blk_mq_update_nr_hw_queues. Now, back to 4.19.14 + guard_hole + v2, I can't seem to reproduce the clockevents_switch_state any more. I'll take a break and then see to find out if I'm doing anything different than earlier today when I could reproduce it 100% consistently. O_o :) Hans