Re: [PATCH 0/8] KVM: Reduce mmu_lock hold time when zapping mmu pages

Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx> · Tue, 05 Feb 2013 13:30:03 +0800

On 02/04/2013 09:42 PM, Marcelo Tosatti wrote:
> On Wed, Jan 23, 2013 at 06:44:52PM +0800, Xiao Guangrong wrote:
>> On 01/23/2013 06:12 PM, Takuya Yoshikawa wrote:
>>> This patch set mitigates another mmu_lock hold time issue.  Although
>>> this is not enough and I'm thinking of additional work already, this
>>> alone can reduce the lock hold time to some extent.
>>>
>>
>> It is not worth doing this kind of complex thing, usually, only a few pages on
>> the invalid list.
> 
> I think its a good idea - memory freeing can be done outside mmu_lock
> protection (as long as its bounded). It reduces mmu_lock contention
> overall.

It is not much help since we still need to walk and delete all shadow
pages, rmaps and parenet-pte-list - still need cost lots of time and not
good for the scalability.

> 
>> The *really* heavily case is kvm_mmu_zap_all() which can be speeded
>> up by using generation number, this is a todo work in kvm wiki:
>>
>> http://www.linux-kvm.org/page/TODO: O(1) mmu invalidation using a generation number
>>
>> I am doing this work for some weeks and will post the patch out during these days.
> 
> Can you describe the generation number scheme in more detail, please?

Yes, but i currently use a simple way instead of the generation number.

The optimization way is, we can switch the hashtable and rmaps into the new one, then
the later page fault can install shadow-pages and rmaps on the new one, and the old one
can be directly freed out of mmu-lock.

More detail:

zap_all_shadow_pages:

hold mmu_lock;
LIST_HEAD(active_list);
LIST_HEAD(pte_list_desc);

/*
 * Prepare the root shadow pages since they can not be
 * freed directly.
 */
for_each_root_sp(sp, mmu->root_sp_list) {
	prepare_zap(sp);
	/* Delete it from mmu->active_list */
	list_del_init(sp->link);
}

/* Zap the hashtable and ramp. */
memset(mmu->hashtable, 0);
memset(memslot->rmap, 0);

list_replace_init(mmu->active_sp_list, active_list);

/* All the pte_list_desc for rmap and parent_list */
list_replace_init(mmu->pte_list_desc_list, pte_list_desc);

/* Reload mmu, let the old shadow pages be zapped. */
kvm_reload_remote_mmus(kvm);

release_mmu_lock;

for_each_sp_on_active_list(sp, active_list)
	kvm_mmu_free_page(sp);

for_each_pte_desc(desc, pte_list_desc)
	mmu_free_pte_list_desc(desc);

The patches is being tested on my box, it works well and can improve
zap_all_shadow_pages more than 75%.

============
Note: later we can use the generation number to continue to optimize it:
zap_all_shadow_pages:
   generation_number++;
   kvm_reload_remote_mmus(kvm);

And, on unload_mmu path:

hold mmu_lock
   if (kvm->generation_number != generation_number) {
	switch hashtable and ramp to the new one;
	kvm->generation_number = generation_number
   }
   release mmu_lock

   free the old one

We need to adjust the code of page-fault and sync-children, let them do not
install sp on the old shadow page cache.
=============

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html