On 07/03/2023 11:07, Vitaly Kuznetsov wrote: > Jeremi Piotrowski <jpiotrowski@xxxxxxxxxxxxxxxxxxx> writes: > >> On 06/03/2023 18:52, Vitaly Kuznetsov wrote: >>> Jeremi Piotrowski <jpiotrowski@xxxxxxxxxxxxxxxxxxx> writes: >>> >>>> TDP MMU has been broken on AMD CPUs when running on Hyper-V since v5.17. >>>> The issue was first introduced by two commmits: >>>> >>>> - bb95dfb9e2dfbe6b3f5eb5e8a20e0259dadbe906 "KVM: x86/mmu: Defer TLB >>>> flush to caller when freeing TDP MMU shadow pages" >>>> - efd995dae5eba57c5d28d6886a85298b390a4f07 "KVM: x86/mmu: Zap defunct >>>> roots via asynchronous worker" >>>> >>>> The root cause is that since then there are missing TLB flushes which >>>> are required by HV_X64_NESTED_ENLIGHTENED_TLB. >>> >>> Please share more details on what's actually missing as you get them, >>> I'd like to understand which flushes can be legally avoided on bare >>> hardware and Hyper-V/VMX but not on Hyper-V/SVM. >>> >> >> See the linked thread here >> https://lore.kernel.org/lkml/20d189fc-8d20-8083-b448-460cc0420151@xxxxxxxxxxxxxxxxxxx/#t >> for all the details/analyses but the summary was that either of these 2 >> options would work, with a) having less flushes (footnote: less flushes is not necessarily >> better): >> >> a) adding a hyperv_flush_guest_mapping(__pa(root->spt) after kvm_tdp_mmu_get_vcpu_root_hpa's call to tdp_mmu_alloc_sp() >> b) adding a hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa) to svm_flush_tlb_current() >> >> These are only needed on Hyper-V/SVM because of how the enlightenment works (needs an explicit >> flush to rebuild L0 shadow page tables). Hyper-V/VMX does not need any changes and currently >> works. Let me know if you need more information on something here, I'll try to get it. >> > > Ah, I missed the whole party! Thanks for the pointers! > >>>> The failure manifests >>>> as L2 guest VMs being unable to complete boot due to memory >>>> inconsistencies between L1 and L2 guests which lead to various >>>> assertion/emulation failures. > > Which levels are we talking about here, *real* L1 and L2 or L1 and L2 > from KVM's perspective (real L2 and L3)? Real L1 and L2. In this whole discussion L0 is Hyper-V, L1 is KVM and L2 is a Linux VM. > >>>> >>>> The HV_X64_NESTED_ENLIGHTENED_TLB enlightenment is always exposed by >>>> Hyper-V on AMD and is always used by Linux. The TLB flush required by >>>> HV_X64_NESTED_ENLIGHTENED_TLB is much stricter than the local TLB flush >>>> that TDP MMU wants to issue. We have also found that with TDP MMU L2 guest >>>> boot performance on AMD is reproducibly slower compared to when TDP MMU is >>>> disabled. >>>> >>>> Disable TDP MMU when using SVM Hyper-V for the time being while we >>>> search for a better fix. >>> >>> I'd suggest we go the other way around: disable >>> HV_X64_NESTED_ENLIGHTENED_TLB on SVM: >> >> Paolo suggested disabling TDP_MMU when HV_X64_NESTED_ENLIGHTENED_TLB is used, and >> I prefer that option too. The enlighenment does offer a nice performance advantage >> with non-TDP_MMU, and I did not see TDP_MMU perform any better compared to that. >> Afaik the code to use the enlightenment on Hyper-V/SVM was written/tested before >> TDP_MMU became the default. >> >> If you have a specific scenario in mind, we could test and see what the implications >> are there. > > I don't have a strong opinion here, I've suggested a smaller change so > it's easier to backport it to stable kernels and easier to revert when a > proper fix comes to mainline. Noted. My concern here is about changing a default in a way that lowers performance, because the proper fix that comes later might end up not being suitable for stable. > For performance implication, I'd only > consider non-nested scenarios from KVM's perspective (i.e. real L2 from > Hyper-V's PoV), as running L3 is unlikely a common use-case and, if I > understood correctly, is broken anyway. I agree with that. Right now L2 is broken, I've never even attempted L3 to see if it would work. Jeremi