On Tue, Dec 10, 2024 at 3:56 PM Kevin Loughlin <kevinloughlin@xxxxxxxxxx> wrote: > > On Wed, Dec 4, 2024 at 2:07 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Wed, Dec 04, 2024, Kevin Loughlin wrote: > > > On Tue, Dec 3, 2024 at 4:27 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > @@ -2152,7 +2191,7 @@ void sev_vm_destroy(struct kvm *kvm) > > > > > * releasing the pages back to the system for use. CLFLUSH will > > > > > * not do this, so issue a WBINVD. > > > > > */ > > > > > - wbinvd_on_all_cpus(); > > > > > + sev_do_wbinvd(kvm); > > > > > > > > I am 99% certain this wbinvd_on_all_cpus() can simply be dropped. sev_vm_destroy() > > > > is called after KVM's mmu_notifier has been unregistered, which means it's called > > > > after kvm_mmu_notifier_release() => kvm_arch_guest_memory_reclaimed(). > > > > > > I think we need a bit of rework before dropping it (which I propose we > > > do in a separate series), but let me know if there's a mistake in my > > > reasoning here... > > > > > > Right now, sev_guest_memory_reclaimed() issues writebacks for SEV and > > > SEV-ES guests but does *not* issue writebacks for SEV-SNP guests. > > > Thus, I believe it's possible a SEV-SNP guest reaches sev_vm_destroy() > > > with dirty encrypted lines in processor caches. Because SME_COHERENT > > > doesn't guarantee coherence across CPU-DMA interactions (d45829b351ee > > > ("KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT > > > CPUs")), it seems possible that the memory gets re-allocated for DMA, > > > written back from an (unencrypted) DMA, and then corrupted when the > > > dirty encrypted version gets written back over that, right? > > > > > > And potentially the same thing for why we can't yet drop the writeback > > > in sev_flush_encrypted_page() without a bit of rework? > > > > Argh, this last one probably does apply to SNP. KVM requires SNP VMs to be backed > > with guest_memfd, and flushing for that memory is handled by sev_gmem_invalidate(). > > But the VMSA is kernel allocated and so needs to be flushed manually. On the plus > > side, the VMSA flush shouldn't use WB{NO}INVD unless things go sideways, so trying > > to optimize that path isn't worth doing. > > Ah thanks, yes agreed for both (that dropping WB{NO}INVD is fine on > the sev_vm_destroy() path given sev_gmem_invalidate() and that the > sev_flush_encrypted_page() path still needs the WB{NO}INVD as a > fallback for now). > > On that note, the WBINVD in sev_mem_enc_unregister_region() can be > dropped too then, right? My understanding is that the host will > instead do WB{NO}INVD for SEV(-ES) guests in > sev_guest_memory_reclaimed(), and sev_gmem_invalidate() will handle > SEV-SNP guests. Nevermind, we can't drop the WBINVD call in sev_mem_enc_unregister_region() without a userspace opt-in because userspace might otherwise rely on the flushing behavior; see Sean's explanation in [0]. So all-in-all I believe... - we can drop the call in sev_vm_destroy() - we *cannot* drop the call in sev_flush_encrypted_page(), nor in sev_mem_enc_unregister_region(). Zheyun, if you get to this series before my own WBNOINVD series [1], I can just rebase on top of yours. I will defer cutting these unneeded calls to you and simply replace applicable WBINVD calls with WBNOINVD in my series. [0] https://lore.kernel.org/all/ZWrM622xUb4pe7gS@xxxxxxxxxx/T/#md364d1fdfc65dc92e306276bd51298cb817c5e53. [1] https://lore.kernel.org/kvm/20250109225533.1841097-2-kevinloughlin@xxxxxxxxxx/T/ > > All in all, I now agree we can drop the unneeded case(s) of issuing > WB{NO}INVDs in this series in an additional commit. I'll then rebase > [0] on the latest version of this series and can also work on the > migration optimizations atop all of it, if that works for you Sean. > > [0] https://lore.kernel.org/lkml/20241203005921.1119116-1-kevinloughlin@xxxxxxxxxx/ > > Thanks!