On 04/18/2013 09:29 PM, Marcelo Tosatti wrote: > On Thu, Apr 18, 2013 at 10:03:06AM -0300, Marcelo Tosatti wrote: >> On Thu, Apr 18, 2013 at 12:00:16PM +0800, Xiao Guangrong wrote: >>>> >>>> What is the justification for this? >>> >>> We want the rmap of being deleted memslot is removed-only that is >>> needed for unmapping rmap out of mmu-lock. >>> >>> ====== >>> 1) do not corrupt the rmap >>> 2) keep pte-list-descs available >>> 3) keep shadow page available >>> >>> Resolve 1): >>> we make the invalid rmap be remove-only that means we only delete and >>> clear spte from the rmap, no new sptes can be added to it. >>> This is reasonable since kvm can not do address translation on invalid rmap >>> (gfn_to_pfn is failed on invalid memslot) and all sptes on invalid rmap can >>> not be reused (they belong to invalid shadow page). >>> ====== >>> >>> clear_flush_young / test_young / change_pte of mmu-notify can rewrite >>> rmap with the present-spte (P bit is set), we should umap rmap in >>> these handlers. >>> >>>> >>>>> + >>>>> + /* >>>>> + * To ensure that all vcpus and mmu-notify are not clearing >>>>> + * spte and rmap entry. >>>>> + */ >>>>> + synchronize_srcu_expedited(&kvm->srcu); >>>>> +} >>>>> + >>>>> #ifdef MMU_DEBUG >>>>> static int is_empty_shadow_page(u64 *spt) >>>>> { >>>>> @@ -2219,6 +2283,11 @@ static void clear_sp_write_flooding_count(u64 *spte) >>>>> __clear_sp_write_flooding_count(sp); >>>>> } >>>>> >>>>> +static bool is_valid_sp(struct kvm *kvm, struct kvm_mmu_page *sp) >>>>> +{ >>>>> + return likely(sp->mmu_valid_gen == kvm->arch.mmu_valid_gen); >>>>> +} >>>>> + >>>>> static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, >>>>> gfn_t gfn, >>>>> gva_t gaddr, >>>>> @@ -2245,6 +2314,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, >>>>> role.quadrant = quadrant; >>>>> } >>>>> for_each_gfn_sp(vcpu->kvm, sp, gfn) { >>>>> + if (!is_valid_sp(vcpu->kvm, sp)) >>>>> + continue; >>>>> + >>>>> if (!need_sync && sp->unsync) >>>>> need_sync = true; >>>>> >>>>> @@ -2281,6 +2353,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, >>>>> >>>>> account_shadowed(vcpu->kvm, gfn); >>>>> } >>>>> + sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen; >>>>> init_shadow_page_table(sp); >>>>> trace_kvm_mmu_get_page(sp, true); >>>>> return sp; >>>>> @@ -2451,8 +2524,12 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, >>>>> ret = mmu_zap_unsync_children(kvm, sp, invalid_list); >>>>> kvm_mmu_page_unlink_children(kvm, sp); >>>> >>>> The rmaps[] arrays linked to !is_valid_sp() shadow pages should not be >>>> accessed (as they have been freed already). >>>> >>>> I suppose the is_valid_sp() conditional below should be moved earlier, >>>> before kvm_mmu_unlink_parents or any other rmap access. >>>> >>>> This is fine: the !is_valid_sp() shadow pages are only reachable >>>> by SLAB and the hypervisor itself. >>> >>> Unfortunately we can not do this. :( >>> >>> The sptes in shadow pape can linked to many slots, if the spte is linked >>> to the rmap of being deleted memslot, it is ok, otherwise, the rmap of >>> still used memslot is miss updated. >>> >>> For example, slot 0 is being deleted, sp->spte[0] is linked on slot[0].rmap, >>> sp->spte[1] is linked on slot[1].rmap. If we do not access rmap of this 'sp', >>> the already-freed spte[1] is still linked on slot[1].rmap. >>> >>> We can let kvm update the rmap for sp->spte[1] and do not unlink sp->spte[0]. >>> This is also not allowed since mmu-notify can access the invalid rmap before >>> the memslot is destroyed, then mmu-notify will get already-freed spte on >>> the rmap or page Access/Dirty is miss tracked (if let mmu-notify do not access >>> the invalid rmap). >> >> Why not release all rmaps? >> >> Subject: [PATCH v2 3/7] KVM: x86: introduce kvm_clear_all_gfn_page_info >> >> This function is used to reset the rmaps and page info of all guest page >> which will be used in later patch >> >> Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx> > > Which you have later in patchset. The patch you mentioned is old (v2), now it only resets lpage-info excluding rmap: ====== [PATCH v3 11/15] KVM: MMU: introduce kvm_clear_all_lpage_info This function is used to reset the large page info of all guest page which will be used in later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx> ====== We can not release all rmaps. If we do this, ->invalidate_page and ->invalidate_range_start can not find any spte using the host page, that means, Accessed/Dirty for host page is missing tracked. (missing call kvm_set_pfn_accessed and kvm_set_pfn_dirty properly.) Furthermore, when we drop a invalid-gen spte, we will call kvm_set_pfn_dirty/kvm_set_pfn_dirty for a already-freed host page since mmu-notify can not find the spte by rmap. (we can skip drop-spte for the invalid-gen sp, but A/D for host page can be missed) That is why i introduced unmap_invalid_rmap out of mmu-lock. > > So, what is the justification for the zap root + generation number increase > to work on a per memslot basis, given that > > /* > * If memory slot is created, or moved, we need to clear all > * mmio sptes. > */ > if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) { > kvm_mmu_zap_mmio_sptes(kvm); > kvm_reload_remote_mmus(kvm); > } > > Is going to be dealt with generation number on mmio spte idea? Yes. Actually, this patchset (v3) is based on other two patchset: ====== This patchset is based on my previous two patchset: [PATCH 0/2] KVM: x86: avoid potential soft lockup and unneeded mmu reload (https://lkml.org/lkml/2013/4/1/2) [PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes (https://lkml.org/lkml/2013/4/1/134) ====== We did that it [PATCH v2 0/6] KVM: MMU: fast invalid all mmio sptes. > > Note at the moment all shadows pages are zapped on deletion / move, > and there is no performance complaint for those cases. Yes, zap-mmio is only needed for MEMSLOT_CREATE. > > In fact, for what case is generation number on mmio spte optimizes for? > The cases are where slots are deleted/moved/created on a live guest > are: > > - Legacy VGA mode operation where VGA slots are created/deleted. Zapping > all shadow not a performance issue in that case. > - Device hotplug (not performance critical). > - Remapping of PCI memory regions (not a performance issue). > - Memory hotplug (not a performance issue). > > These are all rare events in which there is no particular concern about > rebuilding shadow pages to resume cruise speed operation. > > So from this POV (please correct if not accurate) avoiding problems > with huge number of shadow pages is all thats being asked for. > > Which is handled nicely by zap roots + sp gen number increase. So, we can use "zap roots + sp gen number" instead of current zap-mmio-sp? If yes, i totally agree with you. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html