[Bug 198843] Google Compute Engine: Nested virtualization crashes kernel with L1 or L2 when using 4.13-4.15 kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=198843

--- Comment #2 from Ujwal Setlur (ujwal.setlur@xxxxxxxxx) ---
Hi David,

You could very well be right that this is a Google issue. I have opened a
ticket as well on stackoverflow which is the channel available to me with my
level of subscription.

This exact same scenarios works fine on VMWare Fusion on Mac where L1 is Ubuntu
16.04, and L2 is Ubuntu 16.04 as well. It also works on Microsoft Azure which
uses HyperV for their L0 hypervisor. I imagine Google uses KVM as the L0
hypervisor, so thought it might be a good idea to bring it to attention here as
well.

In terms of details of crash, there are a couple of scenarios:

1. L1 = Ubuntu 16.04 (4.13); L2 is Debian-9 (4.9). L2 launches fine, but when I
shutdown the instance, L1 panics. Here is the panic:

[  798.321144] general protection fault: 0000 [#1] SMP PTI
[  798.326531] Modules linked in: kvm_intel ip6table_filter ip6_tables
iptable_filter ip_tables x_tables ppdev pvpa
nic input_leds parport_pc parport kvm serio_raw irqbypass ib_iser rdma_cm iw_cm
ib_cm ib_core iscsi_tcp libiscsi_tc
p libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor asyn
c_tx xor raid6_pq raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel pcbc aesni_intel a
es_x86_64 crypto_simd glue_helper cryptd psmouse virtio_net [last unloaded:
kvm_intel]
[  798.375664] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.13.0-1008-gcp
#11-Ubuntu
[  798.383345] Hardware name: Google Google Compute Engine/Google Compute
Engine, BIOS Google 01/01/2011
[  798.392678] task: ffff9c49f84a5d00 task.stack: ffffaaac062d0000
[  798.398722] RIP: 0010:native_write_cr4+0x4/0x10
[  798.403369] RSP: 0018:ffff9c49ff3c3f48 EFLAGS: 00010006
[  798.408718] RAX: 00000000001626e0 RBX: ffff9c49ff3e3c68 RCX:
ffff9c49ff3e3c90
[  798.416060] RDX: ffff9c49ff3d4020 RSI: 0000000000000007 RDI:
00000000001606e0
[  798.424651] RBP: ffff9c49ff3c3f48 R08: 0000000000000008 R09:
0000000000000002
[  798.432718] R10: 0000000000000283 R11: 000000010001e6b4 R12:
0000000000023c90
[  798.439979] R13: 0000000000000007 R14: 0000000000000000 R15:
0000000000000000
[  798.447234] FS:  0000000000000000(0000) GS:ffff9c49ff3c0000(0000)
knlGS:0000000000000000
[  798.455443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  798.461473] CR2: 00007fd38267b990 CR3: 000000100040a006 CR4:
00000000001626e0
[  798.468726] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  798.476054] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  798.483296] Call Trace:
[  798.485855]  <IRQ>
[  798.487988]  hardware_disable+0x91/0xa0 [kvm_intel]
[  798.492996]  kvm_arch_hardware_disable+0x13/0x40 [kvm]
[  798.498254]  hardware_disable_nolock+0x2f/0x40 [kvm]
[  798.503326]  flush_smp_call_function_queue+0x72/0x110
[  798.508480]  generic_smp_call_function_single_interrupt+0x13/0x30
[  798.514777]  smp_trace_call_function_single_interrupt+0x27/0x40
[  798.520798]  smp_call_function_interrupt+0xe/0x10
[  798.525606]  call_function_interrupt+0x1af/0x1c0
[  798.530332]  </IRQ>
[  798.532533] RIP: 0010:native_safe_halt+0x6/0x10
[  798.537168] RSP: 0018:ffffaaac062d3e80 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff03
[  798.544932] RAX: 0000000000000000 RBX: ffff9c49f84a5d00 RCX:
0000000000000000
[  798.552437] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[  798.559675] RBP: ffffaaac062d3e80 R08: 0000000000000000 R09:
0000000000000002
[  798.566919] R10: 0000000000000283 R11: 000000010001e6b4 R12:
0000000000000007
[  798.574162] R13: ffff9c49f84a5d00 R14: 0000000000000000 R15:
0000000000000000
[  798.581764]  default_idle+0x1e/0x100
[  798.585694]  arch_cpu_idle+0xf/0x20
[  798.589289]  default_idle_call+0x23/0x30
[  798.593316]  do_idle+0x172/0x1f0
[  798.596663]  cpu_startup_entry+0x73/0x80
[  798.600790]  start_secondary+0x175/0x1b0
[  798.604834]  secondary_startup_64+0x9f/0xa0
[  798.609130] Code: 0f 1f 80 00 00 00 00 55 48 89 e5 0f 20 d8 5d c3 0f 1f 80
00 00 00 00 55 48 89 e5 0f 22 df 5d c
3 0f 1f 80 00 00 00 00 55 48 89 e5 <0f> 22 e7 5d c3 0f 1f 80 00 00 00 00 55 48
89 e5 44 0f 20 c0 5d 
[  798.628728] RIP: native_write_cr4+0x4/0x10 RSP: ffff9c49ff3c3f48
[  798.634857] ---[ end trace 2481ff8197fed04a ]---
[  798.639587] Kernel panic - not syncing: Fatal exception in interrupt
[  798.647396] Kernel Offset: 0x11e00000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbf
ffffff)


2. L1 = Ubuntu 16.04; L2 = 16.04. Here, the instance boots up very slowly and
gets stuck at this point for a very long time:

[   14.512192] blk_update_request: I/O error, dev fd0, sector 0
                                                              [   14.564176]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   14.616165]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   14.760178]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   14.816151]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   14.872136]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   14.924164]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   14.980173]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   15.032145]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   15.216139]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   20.948289]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   21.004241]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   21.060286]
blk_update_request: I/O error, dev fd0, sector 0
                                                              [   21.132298]
blk_update_request: I/O error, dev fd0, sector 0

This what I see on the console of L1:

[  343.272418] kvm [3641]: vcpu0, guest rIP: 0xffffffff81064518 disabled
perfctr wrmsr: 0xc2 data 0xffff
Feb 20 19:50:09 fogbot-ubuntu-1 kernel: [  343.272418] kvm [3641]: vcpu0, guest
rIP: 0xffffffff81064518 disabled pe
rfctr wrmsr: 0xc2 data 0xffff

Eventually, I see the login prompt on L2, but then at some point L1 dies.

Hope this helps.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux