Well, the weird thing is that this is hypervisor-specific. KVM=kaboom. VirtualBox is happy, and we can't make this happen on roughly-analogous ESX hosts. I can't directly test on my (ubuntu) laptop because the driver won't build on the too-new ubuntu 20.04.2 "Hardware enablement" kernel as it's too new. But either all the other hypervisors are doing this wrong and allowing this access, or KVM is. Not being a kernel expert makes this interesting. I'm passing the possibility list over the wall to the kernel folks, but most of the evidence we're seeing **seems** to point to KVM... On Fri, May 20, 2022 at 11:22 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, May 20, 2022, Brian Cowan wrote: > > Disabling smap seems to fix the problem... > > Mwhahaha, I should have found someone to bet me real money :-) > > > Now for the hard question: WHY? > > The most likely scenario it that there's a SMAP violation (#PF due to a kernel > access to user data without an override to tell the CPU that the access is intentional) > somewhere in the guest that crashes/panics the guest kernel. Assuming that's the > case, there are three-ish possibilities: > > 1. There's a bug your company's custom kernel driver. > 2. There's a SMAP violation somewhere else in RHEL 7.8, which is an 8+ year old > frankenkernel... > 3. There's a bug in your version of KVM related to SMAP virtualization > > #3 begs the question, does this fail on bare metal that supports SMAP? If so, > then that rules out #3. > > If the crash occurs only when doing stuff related to your custom driver, #1 is > most likely the culprit. > > One way to try and debug further would be to disable EPT in KVM (load kvm_intel with > ept=0) and then use KVM tracepoints to see when the guest dies. If it's a SMAP > violation, there should be an injected SMAP #PF shortly before the guest dies.