I updated a few hypervisors and their VMs to CentOS 6.10 on Monday; today I awoke to an alert saying all VMs are down. It looks like a very old bug crept back in. The machine is a ProLiant DL380 G7 with Xeon X5675 and 96 GB, running half a dozen smallish VMs. Hypervisor and all VMs have kernel 2.6.32-754.2.1.el6.x86_64. Around the time the VMs must have gone down, there are quite a few error messages like the following in the system log: Aug 16 03:10:13 hyper-7 kernel: [265397.382552] vmwrite error: reg 6000 value fffffffffffffff7 (err -9) Aug 16 03:10:13 hyper-7 kernel: [265397.421372] Pid: 9375, comm: qemu-kvm Not tainted 2.6.32-754.2.1.el6.x86_64 #1 Aug 16 03:10:13 hyper-7 kernel: [265397.464985] Call Trace: Aug 16 03:10:13 hyper-7 kernel: [265397.481530] [<ffffffffa0532a9c>] ? vmwrite_error+0x2c/0x30 [kvm_intel] Aug 16 03:10:13 hyper-7 kernel: [265397.520737] [<ffffffffa0532ac0>] ? vmcs_writel+0x20/0x30 [kvm_intel] Aug 16 03:10:13 hyper-7 kernel: [265397.560028] [<ffffffffa0535e63>] ? vmx_fpu_activate+0x93/0xc0 [kvm_intel] Aug 16 03:10:14 hyper-7 kernel: [265397.600072] [<ffffffffa04cd1e7>] ? kvm_arch_vcpu_create+0x37/0x50 [kvm] Aug 16 03:10:14 hyper-7 kernel: [265397.638183] [<ffffffffa04c72a1>] ? kvm_vm_ioctl+0x601/0x1050 [kvm] Aug 16 03:10:14 hyper-7 kernel: [265397.674367] [<ffffffff8113f461>] ? free_one_page+0x191/0x440 Aug 16 03:10:14 hyper-7 kernel: [265397.708101] [<ffffffff811b4159>] ? vfs_ioctl+0x29/0xc0 Aug 16 03:10:14 hyper-7 kernel: [265397.739124] [<ffffffff81142d86>] ? __free_pages+0x46/0xa0 Aug 16 03:10:14 hyper-7 kernel: [265397.773193] [<ffffffff811b463a>] ? do_vfs_ioctl+0x3aa/0x590 Aug 16 03:10:14 hyper-7 kernel: [265397.805774] [<ffffffff81142e29>] ? free_pages+0x49/0x50 Aug 16 03:10:14 hyper-7 kernel: [265397.839147] [<ffffffff811b48a1>] ? sys_ioctl+0x81/0xa0 Aug 16 03:10:14 hyper-7 kernel: [265397.870109] [<ffffffff810f1d0e>] ? __audit_syscall_exit+0x25e/0x290 Aug 16 03:10:14 hyper-7 kernel: [265397.909358] [<ffffffff81560351>] ? system_call_fastpath+0x2f/0x34 Curiously, the messages don't seem to indicate anything fatal in and of themselves; there are a two like this a minute after bootup and like a dozen more after about a day, none of which seems to have crashed anything. However, it's the only obvious anomaly I could find around the time and as it's VT-x related, I reckon there's a connection. The stack trace closely resembles this bug that turned up in 2015 and was fixed long ago: https://lkml.org/lkml/2015/7/3/288 Has anyone seen this recently and could confirm or refute any of my guesswork? Cheers, Matthias _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos