2017-02-05 16:21 GMT+03:00 Wanpeng Li <kernellwp@xxxxxxxxx>: > 2017-02-05 16:39 GMT+08:00 Matwey V. Kornilov <matwey.kornilov@xxxxxxxxx>: >> Hello, >> >> I've bisected that commit defcf51fa93929bd ("KVM: x86: allow TSC >> deadline timer on all hosts") makes guest kernels crash under specific >> circumstances. >> The issue itself is the following. I use host linux kernel (was >> bisected) to run guest linux kernels using qemu-kvm (version 2.3.1. >> earlier version 2.1 has also been checked and found demonstrating the >> same behavior) >> >> I've found that >> >> 1) the following qemu command >> >> qemu-system-x86_64 -machine accel=kvm -nodefaults -no-reboot >> -nographic -cpu host -vga none -kernel kernel -initrd initrd -append >> 'panic=1 no-kvmclock console=ttyS0 loglevel=7' -m 1024 -serial stdio >> >> successfully boots the guest kernel when host kernel version is prior >> defcf51fa93929bd (3.16 3.18 3.19 was checked) >> >> 2) the same command leads to the guest kernel failure with the same >> qemu binary and the same kernel and initrd images when host kernel >> 4.0+ is used (4.0 4.4 4.9 was checked): >> >> [ 0.588000] divide error: 0000 [#1] SMP >> [ 0.588000] Modules linked in: >> [ 0.588000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W >> 3.16.6-2-default #1 >> [ 0.588000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org >> 04/01/2014 >> [ 0.588000] task: ffff88003d61e010 ti: ffff88003d63c000 task.ti: >> ffff88003d63c000 >> [ 0.588000] RIP: 0010:[<ffffffff810c6e7f>] [<ffffffff810c6e7f>] >> clockevents_config.part.3+0x1f/0xa0 >> [ 0.588000] RSP: 0000:ffff88003d63fe90 EFLAGS: 00010246 >> [ 0.588000] RAX: ffffffffffffffff RBX: ffff88003f80ce80 RCX: 0000000000000000 >> [ 0.588000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff >> [ 0.588000] RBP: 000000000000b060 R08: 0000000000000001 R09: ffffffff81ee84b0 >> [ 0.588000] R10: 00000000000000bb R11: 0000000000000003 R12: 000000000000b0a0 >> [ 0.588000] R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000000 >> [ 0.588000] FS: 0000000000000000(0000) GS:ffff88003f800000(0000) >> knlGS:0000000000000000 >> [ 0.588000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 0.588000] CR2: 00000000ffffffff CR3: 0000000001c13000 CR4: 00000000000006f0 >> [ 0.588000] Stack: >> [ 0.588000] ffff88003f80ce80 000000000000b060 000000000000b0a0 >> ffffffff810c73cc >> [ 0.588000] 0000000000000000 ffffffff81d15952 ffffffff81e63338 >> 0000000000000000 >> [ 0.588000] 0000000000000000 0000000000000000 ffffffff81d0910c >> 0000000000000000 >> [ 0.588000] Call Trace: >> [ 0.588000] [<ffffffff810c73cc>] clockevents_config_and_register+0x1c/0x30 >> [ 0.588000] [<ffffffff81d15952>] native_smp_prepare_cpus+0x3a1/0x3d0 >> [ 0.588000] [<ffffffff81d0910c>] kernel_init_freeable+0xc1/0x202 >> [ 0.588000] [<ffffffff815bc04a>] kernel_init+0xa/0xf0 >> [ 0.588000] [<ffffffff815d0b7c>] ret_from_fork+0x7c/0xb0 >> [ 0.588000] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 >> 90 41 54 31 d2 89 f1 89 f6 41 b8 01 00 00 00 55 53 48 89 fb 48 8b 7f >> 70 48 89 f8 <48> f7 f6 48 85 c0 74 0b 48 3d 58 02 00 00 41 89 c0 77 4e >> 4c 8d >> [ 0.588000] RIP [<ffffffff810c6e7f>] clockevents_config.part.3+0x1f/0xa0 >> [ 0.588000] RSP <ffff88003d63fe90> >> [ 0.592000] ---[ end trace 6dcb37223984f47d ]--- >> [ 0.596000] Kernel panic - not syncing: Attempted to kill init! >> exitcode=0x0000000b >> >> As soon as the guest kernel were bootable before in this >> configuration, I think this could be a regression. But I am not sure >> how the commit exactly affects the behavior. >> >> I've also tried to understand what was happening inside the guest >> kernel native_calibrate_tsc() function. Nothing interesting except >> that both tsc1 and tsc2 are ULLONG_MAX after the for loop. >> >> References: https://bugzilla.opensuse.org/show_bug.cgi?id=1023358 > > It is the guest kernel bug, please refers to commit b47dcbdc516 (x86, > apic: Handle a bad TSC more gracefully). > Hello, Frankly speaking it is always unclear to me where the criterion is. If it is a bug in guest kernel, why did it ever work with elder host kernel? -- With best regards, Matwey V. Kornilov http://blog.matwey.name xmpp://0x2207@xxxxxxxxx