On Wed, Aug 02, 2023, Amaan Cheval wrote: > > Yeesh. There is a ridiculous amount of potentially problematic activity. KSM is > > active in that trace, it looks like NUMA balancing might be in play, > > Sorry about the delayed response - it seems like the majority of locked up guest > VMs stop throwing repeated EPT_VIOLATIONs as soon as we turn `numa_balancing` > off. LOL, NUMA autobalancing. I have a longstanding hatred of that feature. I'm sure there are setups where it adds value, but from my perspective it's nothing but pain and misery. > They still remain locked up, but that might be because the original cause of the > looping EPT_VIOLATIONs corrupted/crashed them in an unrecoverable way (are there > any ways you can think of that that might happen)? Define "remain locked up". If the vCPUs are actively running in the guest and making forward progress, i.e. not looping on VM-Exits on a single RIP, then they aren't stuck from KVM's perspective. But that doesn't mean the guest didn't take punitive action when a vCPU was effectively stalled indefinitely by KVM, e.g. from the guest's perspective the stuck vCPU will likely manifest as a soft lockup, and that could lead to a panic() if the guest is a Linux kernel running with softlockup_panic=1.