Hi Marcelo, On 08/23/2011 04:00 PM, Marcelo Tosatti wrote: > On Tue, Aug 16, 2011 at 02:46:47PM +0800, Xiao Guangrong wrote: >> Detecting write-flooding does not work well, when we handle page written, if >> the last speculative spte is not accessed, we treat the page is >> write-flooding, however, we can speculative spte on many path, such as pte >> prefetch, page synced, that means the last speculative spte may be not point >> to the written page and the written page can be accessed via other sptes, so >> depends on the Accessed bit of the last speculative spte is not enough > > Yes, a stale last_speculative_spte is possible, but is this fact a > noticeable problem in practice? > > Was this detected by code inspection? > I detected this because: i noticed some shadow page is zapped by write-flooding but it is accessed soon, it causes the shadow page zapped and alloced again and again(very frequently). Another reason is that: in current code, write-flooding is little complex and it stuffs code in many places, actually, write-flooding is only needed for shadow page/nested guest, so i want to simplify it and wrap its code up. >> - } >> + if (spte && !(*spte & shadow_accessed_mask)) >> + sp->write_flooding_count++; >> + else >> + sp->write_flooding_count = 0; > > This relies on the sptes being created by speculative means > or by pressure on the host clearing the accessed bit for the > shadow page to be zapped. > > There is no guarantee that either of these is true for a given > spte. > > And if the sptes do not have accessed bit set, any nonconsecutive 3 pte > updates will zap the page. > Please note we clear 'sp->write_flooding_count' when it is accessed from shadow page cache (in kvm_mmu_get_page), it means if any spte of sp generates #PF, the fooding count can be reset. And, i think there are not problems since: if the spte without accssed bit is written frequently, it means the guest page table is accessed infrequently or during the writing, the guest page table is not accessed, in this time, zapping this shadow page is not bad. Comparing the old way, the advantage of it is good for zapping upper shadow page, for example, in the old way: if a gfn is used as PDE for a task, later, the gfn is freed and used as PTE for the new task, so we have two shadow pages in the host, one sp1.level = 2 and the other sp2.level = 1. So, when we detect write-flooding, the vcpu->last_pte_updated always point to sp2.pte. As sp2 is used for the new task, we always detected both shadow pages are bing used, but actually, sp1 is not used by guest anymore. > Back to the first question, what is the motivation for this heuristic > change? Do you have any numbers? > Yes, i have done the quick test: before this patch: 2m56.561 2m50.651 2m51.220 2m52.199 2m48.066 After this patch: 2m51.194 2m55.980 2m50.755 2m47.396 2m46.807 It shows the new way is little better than the old way. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html