Looking also at the other crash [0]: msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap; ffffffff811f65b7: e8 44 cb 57 00 callq ffffffff81773100 <__sanitizer_cov_trace_pc> ffffffff811f65bc: 48 8b 54 24 08 mov 0x8(%rsp),%rdx ffffffff811f65c1: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax ffffffff811f65c8: fc ff df ffffffff811f65cb: 48 c1 ea 03 shr $0x3,%rdx ffffffff811f65cf: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <- fault here. ffffffff811f65d3: 0f 85 36 19 00 00 jne ffffffff811f7f0f <vmx_vcpu_run+0x236f> %rdx should contain a pointer to loaded_vmcs. It is directly loaded from the stack [0x8(%rsp)]. This same stack location was just used before the inlined assembly for VMRESUME/VMLAUNCH here: vmx->__launched = vmx->loaded_vmcs->launched; ffffffff811f639f: e8 5c cd 57 00 callq ffffffff81773100 <__sanitizer_cov_trace_pc> ffffffff811f63a4: 48 8b 54 24 08 mov 0x8(%rsp),%rdx ffffffff811f63a9: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax ffffffff811f63b0: fc ff df ffffffff811f63b3: 48 c1 ea 03 shr $0x3,%rdx ffffffff811f63b7: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <- used here. ... and this stack location was never touched by anything in between! So something must have corrupted the stack itself not really the kvm_vc pu struct. Obviously the inlined assembly block is using the stack as well, but I can not see anything that would cause this corruption there. That being said, looking at the %rsp and %rbp values that are dumped in the stack trace: RSP: ffff8801b7d7f380 RBP: ffff8801b8260140 ... they are almost 4.8 MiB apart! Should not these two register be a bit closer to each other? :) So 2 possibilities here: 1- %rsp is wrong That would explain why the loaded_vmcs was NULL. However, it is a bit harder to understand how it became wrong! It should have been restored during the VMEXIT from the HOST_RSP value in the VMCS! Is this a nested setup? 2- %rbp is wrong That would also explain why the loaded_vmcs was NULL. Whatever corrupted the stack that caused loaded_vmcs to be NULL could have also corrupted the %rbp saved in the stack. That would mean that it happened during a function call. All function calls that happened between the point when the stack was sane (just before the "asm" block for VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I can not see where the stack would get corrupted though! Obviously another source of corruption can be a completely unrelated thread directly corruption this thread's memory. Maybe it would be easier to just try to repro it first and see which one is true (if at all). [0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550 On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote: > 22: 0f 01 c3 vmresume > 25: 48 89 4c 24 08 mov %rcx,0x8(%rsp) > 2a: 59 pop %rcx > > <rip>: > 2b: 0f 96 81 88 56 00 00 setbe 0x5688(%rcx) > 32: 48 89 81 00 03 00 00 mov %rax,0x300(%rcx) > 39: 48 89 99 18 03 00 00 mov %rbx,0x318(%rcx) > > %rcx should be pointing to the vcpu_vmx structure, but it's not even > canonical: 1ffff10035842e78. > Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B