On Mon, Nov 11, 2019 at 06:48:59PM +0100, Greg Kroah-Hartman wrote: > On Mon, Nov 11, 2019 at 09:37:57AM -0800, Sean Christopherson wrote: > > On Mon, Nov 11, 2019 at 10:32:05AM +0100, Thomas Lamprecht wrote: > > > On 10/3/19 5:53 PM, Greg Kroah-Hartman wrote: > > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > > > > > > > > commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream. > > > > > > > > Manually generate the PDPTR reserved bit mask when explicitly loading > > > > PDPTRs. The reserved bits that are being tracked by the MMU reflect the > > > It seems that a backport of this to stable and distro kernels tickled out > > > some issue[0] for KVM Linux 64bit guests on older than about 8-10 year old > > > Intel CPUs[1]. > > > > It manifests specifically when running with EPT disabled (no surprise > > there). Actually, it probably would reproduce simply with unrestricted > > guest disabled, but that's beside the point. > > > > The issue is a flawed PAE-paging check in kvm_set_cr3(), which causes KVM > > to incorrectly load PDPTRs in 64-bit mode and inject a #GP. It's a sneaky > > little bugger because the "if (is_long_mode() ..." makes it appear to be > > correct at first glance. > > > > if (is_long_mode(vcpu) && > > (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63))) > > return 1; > > else if (is_pae(vcpu) && is_paging(vcpu) && <--- needs !is_long_mode() > > !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) > > return 1; > > > > With unrestricted guest, KVM doesn't intercept writes to CR3 and so doesn't > > trigger the buggy code. This doesn't fail upstream because the offending > > code was refactored to encapsulate the PAE checks in a single helper, > > precisely to avoid this type of headache. > > > > commit bf03d4f9334728bf7c8ffc7de787df48abd6340e > > Author: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > Date: Thu Jun 6 18:52:44 2019 +0200 > > > > KVM: x86: introduce is_pae_paging > > > > Checking for 32-bit PAE is quite common around code that fiddles with > > the PDPTRs. Add a function to compress all checks into a single > > invocation. > > > > > > Commit bf03d4f93347 ("KVM: x86: introduce is_pae_paging") doesn't apply > > cleanly to 4.19 or earlier because of the VMX file movement in 4.20. But, > > the revelant changes in x86.c do apply cleanly, and I've quadruple checked > > that the PAE checks in vmx.c are correct, i.e. applying the patch and > > ignoring the nested.c/vmx.c conflicts would be a viable lazy option. > > > > > Basically, booting this kernel as host, then running an KVM guest distro > > > or kernel fails it that guest kernel early in the boot phase without any > > > error or other log to serial console, earlyprintk. > > > > ... > > > > > > > > [0]: https://bugzilla.kernel.org/show_bug.cgi?id=205441 > > > [1]: models tested as problematic are: intel core2duo E8500; Xeon E5420; so > > > westmere, conroe and that stuff. AFAICT anything from about pre-2010 which > > > has VMX support (i.e. is 64bit based) > > > > Note, not Westmere, which has EPT and unrestricted guest. Xeon E5420 is > > Harpertown, a.k.a. Penryn, the shrink of Conroe. > > > Thanks for figuring this out, can you send us a patch that we can apply > to fix this issue in the stable tree? Can do. A custom backport will be need for 4.20 and earlier, not 4.19 and earlier. I misremembered when we did the VMX refactoring. For 5.0, 5.1 and 5.2, commit bf03d4f93347 can be applied directly.