On 02/04/2013 09:42 PM, Marcelo Tosatti wrote: > On Wed, Jan 23, 2013 at 06:44:52PM +0800, Xiao Guangrong wrote: >> On 01/23/2013 06:12 PM, Takuya Yoshikawa wrote: >>> This patch set mitigates another mmu_lock hold time issue. Although >>> this is not enough and I'm thinking of additional work already, this >>> alone can reduce the lock hold time to some extent. >>> >> >> It is not worth doing this kind of complex thing, usually, only a few pages on >> the invalid list. > > I think its a good idea - memory freeing can be done outside mmu_lock > protection (as long as its bounded). It reduces mmu_lock contention > overall. It is not much help since we still need to walk and delete all shadow pages, rmaps and parenet-pte-list - still need cost lots of time and not good for the scalability. > >> The *really* heavily case is kvm_mmu_zap_all() which can be speeded >> up by using generation number, this is a todo work in kvm wiki: >> >> http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number >> >> I am doing this work for some weeks and will post the patch out during these days. > > Can you describe the generation number scheme in more detail, please? Yes, but i currently use a simple way instead of the generation number. The optimization way is, we can switch the hashtable and rmaps into the new one, then the later page fault can install shadow-pages and rmaps on the new one, and the old one can be directly freed out of mmu-lock. More detail: zap_all_shadow_pages: hold mmu_lock; LIST_HEAD(active_list); LIST_HEAD(pte_list_desc); /* * Prepare the root shadow pages since they can not be * freed directly. */ for_each_root_sp(sp, mmu->root_sp_list) { prepare_zap(sp); /* Delete it from mmu->active_list */ list_del_init(sp->link); } /* Zap the hashtable and ramp. */ memset(mmu->hashtable, 0); memset(memslot->rmap, 0); list_replace_init(mmu->active_sp_list, active_list); /* All the pte_list_desc for rmap and parent_list */ list_replace_init(mmu->pte_list_desc_list, pte_list_desc); /* Reload mmu, let the old shadow pages be zapped. */ kvm_reload_remote_mmus(kvm); release_mmu_lock; for_each_sp_on_active_list(sp, active_list) kvm_mmu_free_page(sp); for_each_pte_desc(desc, pte_list_desc) mmu_free_pte_list_desc(desc); The patches is being tested on my box, it works well and can improve zap_all_shadow_pages more than 75%. ============ Note: later we can use the generation number to continue to optimize it: zap_all_shadow_pages: generation_number++; kvm_reload_remote_mmus(kvm); And, on unload_mmu path: hold mmu_lock if (kvm->generation_number != generation_number) { switch hashtable and ramp to the new one; kvm->generation_number = generation_number } release mmu_lock free the old one We need to adjust the code of page-fault and sync-children, let them do not install sp on the old shadow page cache. ============= -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html