On 2/25/22 19:22, Sean Christopherson wrote:
@@ -5656,7 +5707,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) * Note: we need to do this under the protection of mmu_lock, * otherwise, vcpu would purge shadow page but miss tlb flush. */ - kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD); + kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
I was going to squash in this: * invalidating TDP MMU roots must be done while holding mmu_lock for - * write and in the same critical section as making the reload request, + * write and in the same critical section as making the free request, * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and yield. But then I realized that this needs better comments and that my knowledge of this has serious holes. Regarding this comment, this is my proposal: /* * Invalidated TDP MMU roots are zapped within MMU read_lock to be * able to walk the list of roots, but with the expectation of no * concurrent change to the pages themselves. There cannot be * any yield between kvm_tdp_mmu_invalidate_all_roots and the free * request, otherwise somebody could grab a reference to the root * and break that assumption. */ if (is_tdp_mmu_enabled(kvm)) kvm_tdp_mmu_invalidate_all_roots(kvm); However, for the second comment (the one in the context above), there's much more. From easier to harder: 1) I'm basically clueless about the TLB flush "note" above. 2) It's not clear to me what needs to use for_each_tdp_mmu_root; for example, why would anything but the MMU notifiers use for_each_tdp_mmu_root? It is used in kvm_tdp_mmu_write_protect_gfn, kvm_tdp_mmu_try_split_huge_pages and kvm_tdp_mmu_clear_dirty_pt_masked. 3) Does it make sense that yielding users of for_each_tdp_mmu_root must either look at valid roots only, or take MMU lock for write? If so, can this be enforced in tdp_mmu_next_root? 4) If the previous point is correct, _who_ could grab a reference and not release it before kvm_tdp_mmu_zap_invalidated_roots runs? That is, is "somebody could grab a reference" an accurate explanation in the first comment above? Thanks, Paolo