Re: [PATCH] KVM: SVM: Disable TDP MMU when running on Hyper-V

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 10 Apr 2023 16:25:40 -0700

On Wed, Apr 05, 2023, Jeremi Piotrowski wrote:
> On 3/7/2023 6:36 PM, Sean Christopherson wrote:
> > Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM:
> > hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway.  KVM
> > doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just
> > happens to get lucky and not run afoul of the underlying bugs.  The revert appears
> > to be reasonably straightforward (see bottom).
> 
> Hi Sean,
> 
> I'm back, and I don't have good news. The fix for the missing hyperv TLB flushes has
> landed in Linus' tree and I now had the chance to test things outside Azure, in WSL on my
> AMD laptop.
> 
> There is some seriously weird interaction going on between TDP MMU and Hyper-V, with
> or without enlightened TLB. My laptop has 16 vCPUs, so the WSL VM also has 16 vCPUs.
> I have hardcoded the kernel to disable enlightened TLB (so we know that is not interfering).
> I'm running a Flatcar Linux VM inside the WSL VM using legacy BIOS, a single CPU
> and 4GB of RAM.
> 
> If I run with `kvm.tdp_mmu=0`, I can boot and shutdown my VM consistently in 20 seconds.
> 
> If I run with TDP MMU, the VM boot stalls for seconds at a time in various spots
> (loading grub, decompressing kernel, during kernel boot), the boot output feels like
> it's happening in slow motion. The fastest I see it finish the same cycle is 2 minutes,
> I have also seen it take 4 minutes, sometimes even not finish at all. Same everything,
> the only difference is the value of `kvm.tdp_mmu`.

When a stall occurs, can you tell where the time is lost?  E.g. is the CPU stuck
in L0, L1, or L2?  L2 being a single vCPU rules out quite a few scenarios, e.g.
lock contention and whatnot.

If you can run perf in WSL, that might be the easiest way to suss out what's going
on.

> So I would like to revisit disabling tdp_mmu on hyperv altogether for the time being but it
> should probably be with the following condition:
> 
>   tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled && !hypervisor_is_type(X86_HYPER_MS_HYPERV)
> 
> Do you have an environment where you would be able to reproduce this? A Windows server perhaps
> or an AMD laptop?

Hrm, not easily, no.  Can you try two things?

  1. Linus' tree on Intel hardware
  2. kvm-x86/next[*] on Intel hardware

Don't bother with #2 if #1 (Linus' tree) does NOT suffer the same stalls as AMD.
#2 is interesting iff Intel is also affected as kvm-x86/next has an optimization
for CR0.WP toggling, which was the achilles heel of the TDP MMU.  If Intel isn't
affected, then something other than CR0.WP is to blame.

I fully expect both experiments to show the same behavior as AMD, but if for some
reason they don't, the results should help narrow the search.

[*] https://github.com/kvm-x86/linux/tree/next