On Wed, Apr 05, 2023, Jeremi Piotrowski wrote: > On 3/7/2023 6:36 PM, Sean Christopherson wrote: > > Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM: > > hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway. KVM > > doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just > > happens to get lucky and not run afoul of the underlying bugs. The revert appears > > to be reasonably straightforward (see bottom). > > Hi Sean, > > I'm back, and I don't have good news. The fix for the missing hyperv TLB flushes has > landed in Linus' tree and I now had the chance to test things outside Azure, in WSL on my > AMD laptop. > > There is some seriously weird interaction going on between TDP MMU and Hyper-V, with > or without enlightened TLB. My laptop has 16 vCPUs, so the WSL VM also has 16 vCPUs. > I have hardcoded the kernel to disable enlightened TLB (so we know that is not interfering). > I'm running a Flatcar Linux VM inside the WSL VM using legacy BIOS, a single CPU > and 4GB of RAM. > > If I run with `kvm.tdp_mmu=0`, I can boot and shutdown my VM consistently in 20 seconds. > > If I run with TDP MMU, the VM boot stalls for seconds at a time in various spots > (loading grub, decompressing kernel, during kernel boot), the boot output feels like > it's happening in slow motion. The fastest I see it finish the same cycle is 2 minutes, > I have also seen it take 4 minutes, sometimes even not finish at all. Same everything, > the only difference is the value of `kvm.tdp_mmu`. When a stall occurs, can you tell where the time is lost? E.g. is the CPU stuck in L0, L1, or L2? L2 being a single vCPU rules out quite a few scenarios, e.g. lock contention and whatnot. If you can run perf in WSL, that might be the easiest way to suss out what's going on. > So I would like to revisit disabling tdp_mmu on hyperv altogether for the time being but it > should probably be with the following condition: > > tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled && !hypervisor_is_type(X86_HYPER_MS_HYPERV) > > Do you have an environment where you would be able to reproduce this? A Windows server perhaps > or an AMD laptop? Hrm, not easily, no. Can you try two things? 1. Linus' tree on Intel hardware 2. kvm-x86/next[*] on Intel hardware Don't bother with #2 if #1 (Linus' tree) does NOT suffer the same stalls as AMD. #2 is interesting iff Intel is also affected as kvm-x86/next has an optimization for CR0.WP toggling, which was the achilles heel of the TDP MMU. If Intel isn't affected, then something other than CR0.WP is to blame. I fully expect both experiments to show the same behavior as AMD, but if for some reason they don't, the results should help narrow the search. [*] https://github.com/kvm-x86/linux/tree/next