On Wed, Jul 15, 2020 at 04:49:57PM -0400, Boris Ostrovsky wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On 7/15/20 3:49 PM, Anchal Agarwal wrote: > > On Mon, Jul 13, 2020 at 03:43:33PM -0400, Boris Ostrovsky wrote: > >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > >> > >> > >> > >> On 7/10/20 2:17 PM, Agarwal, Anchal wrote: > >>> Gentle ping on this series. > >> > >> Have you tested save/restore? > >> > > No, not with the last few series. But a good point, I will test that and get > > back to you. Do you see anything specific in the series that suggests otherwise? > > > root@ovs104> xl save pvh saved > Saving to saved new xl format (info 0x3/0x0/1699) > xc: info: Saving domain 3, type x86 HVM > xc: Frames: 1044480/1044480 100% > xc: End of stream: 0/0 0% > root@ovs104> xl restore saved > Loading new save file saved (new xl fmt info 0x3/0x0/1699) > Savefile contains xl domain config in JSON format > Parsing config from <saved> > xc: info: Found x86 HVM domain from Xen 4.13 > xc: info: Restoring domain > xc: info: Restore successful > xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1 > xc: info: Console: mfn 0xfefff, dom 0, evt 2 > root@ovs104> xl console pvh > [ 139.943872] ------------[ cut here ]------------ > [ 139.943872] kernel BUG at arch/x86/xen/enlighten.c:205! > [ 139.943872] invalid opcode: 0000 [#1] SMP PTI > [ 139.943872] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.8.0-rc5 #26 > [ 139.943872] RIP: 0010:xen_vcpu_setup+0x16d/0x180 > [ 139.943872] Code: 4a 8b 14 f5 40 c9 1b 82 48 89 d8 48 89 2c 02 8b 05 > a4 d4 40 01 85 c0 0f 85 15 ff ff ff 4a 8b 04 f5 40 c9 1b 82 e9 f4 fe ff > ff <0f> 0b b8 ed ff ff ff e9 14 ff ff ff e8 12 4f 86 00 66 90 66 66 66 > [ 139.943872] RSP: 0018:ffffc9000006bdb0 EFLAGS: 00010046 > [ 139.943872] RAX: 0000000000000000 RBX: ffffc9000014fe00 RCX: > 0000000000000000 > [ 139.943872] RDX: ffff88803fc00000 RSI: 0000000000016128 RDI: > 0000000000000000 > [ 139.943872] RBP: 0000000000000000 R08: 0000000000000000 R09: > 0000000000000000 > [ 139.943872] R10: ffffffff826174a0 R11: ffffc9000006bcb4 R12: > 0000000000016120 > [ 139.943872] R13: 0000000000016120 R14: 0000000000016128 R15: > 0000000000000000 > [ 139.943872] FS: 0000000000000000(0000) GS:ffff88803fc00000(0000) > knlGS:0000000000000000 > [ 139.943872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 139.943872] CR2: 00007f704be8b000 CR3: 000000003901a004 CR4: > 00000000000606f0 > [ 139.943872] Call Trace: > [ 139.943872] ? __kmalloc+0x167/0x260 > [ 139.943872] ? xen_manage_runstate_time+0x14a/0x170 > [ 139.943872] xen_vcpu_restore+0x134/0x170 > [ 139.943872] xen_hvm_post_suspend+0x1d/0x30 > [ 139.943872] xen_arch_post_suspend+0x13/0x30 > [ 139.943872] xen_suspend+0x87/0x190 > [ 139.943872] multi_cpu_stop+0x6d/0x110 > [ 139.943872] ? stop_machine_yield+0x10/0x10 > [ 139.943872] cpu_stopper_thread+0x47/0x100 > [ 139.943872] smpboot_thread_fn+0xc5/0x160 > [ 139.943872] ? sort_range+0x20/0x20 > [ 139.943872] kthread+0xfe/0x140 > [ 139.943872] ? kthread_park+0x90/0x90 > [ 139.943872] ret_from_fork+0x22/0x30 > [ 139.943872] Modules linked in: > [ 139.943872] ---[ end trace 74716859a6b4f0a8 ]--- > [ 139.943872] RIP: 0010:xen_vcpu_setup+0x16d/0x180 > [ 139.943872] Code: 4a 8b 14 f5 40 c9 1b 82 48 89 d8 48 89 2c 02 8b 05 > a4 d4 40 01 85 c0 0f 85 15 ff ff ff 4a 8b 04 f5 40 c9 1b 82 e9 f4 fe ff > ff <0f> 0b b8 ed ff ff ff e9 14 ff ff ff e8 12 4f 86 00 66 90 66 66 66 > [ 139.943872] RSP: 0018:ffffc9000006bdb0 EFLAGS: 00010046 > [ 139.943872] RAX: 0000000000000000 RBX: ffffc9000014fe00 RCX: > 0000000000000000 > [ 139.943872] RDX: ffff88803fc00000 RSI: 0000000000016128 RDI: > 0000000000000000 > [ 139.943872] RBP: 0000000000000000 R08: 0000000000000000 R09: > 0000000000000000 > [ 139.943872] R10: ffffffff826174a0 R11: ffffc9000006bcb4 R12: > 0000000000016120 > [ 139.943872] R13: 0000000000016120 R14: 0000000000016128 R15: > 0000000000000000 > [ 139.943872] FS: 0000000000000000(0000) GS:ffff88803fc00000(0000) > knlGS:0000000000000000 > [ 139.943872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 139.943872] CR2: 00007f704be8b000 CR3: 000000003901a004 CR4: > 00000000000606f0 > [ 139.943872] Kernel panic - not syncing: Fatal exception > [ 139.943872] Shutting down cpus with NMI > [ 143.927559] Kernel Offset: disabled > root@ovs104> > I think I may have found a bug. There were no issues with V1 version that I send however, there were issues with V2. I tested both series and found xl save/restore to be working in V1 but not in V2. I should have tested it. Anyways, looks the issue is coming from executing syscore ops registered for hibernation use case during call to xen_suspend. I remember your comment from earlier where you did ask why we need to check xen_suspend mode xen_syscore_suspend [patch-004] and I removed that based on my theoretical understanding of your suggestion that since lock_system_sleep() lock is taken, we cannot initialize hibernation. I skipped to check the part in the code where during xen_suspend(), all registered syscore_ops suspend callbacks are called. Hence the ones registered for PM hibernation will also be called. With no check there on suspend mode, it fails to return from the function and they never should be executed in case of xen suspend. I will revert a part of that check in Patch-004 from V1 and send an updated patch with the fix. Thanks, Anchal