On Monday 31 August 2009 11:55:24 Avi Kivity wrote: > On 08/28/2009 05:31 AM, Max Laier wrote: > > Hello, > > > > it seems to me that the reclaim mechanism for shadow page table pages is > > sub- optimal. The arch.active_mmu_pages list that is used for reclaiming > > does not move up parent shadow page tables when a child is added so when > > we need a new shadow page we zap the oldest - which can well be a > > directory level page holding a just added table level page. > > > > Attached is a proof-of-concept diff and two plots before and after. The > > plots show referenced guest pages over time. > > What do you mean by referenced guest pages? Total number of populated > sptes? Yes. > > As you can see there is less saw- > > toothing in the after plot and also fewer changes overall (because we > > don't zap mappings that are still in use as often). This is with a limit > > of 64 for the shadow page table to increase the effect and vmx/ept. > > > > I realize that the list_move and parent walk are quite expensive and that > > kvm_mmu_alloc_page is only half the story. It should really be done > > every time a new guest page table is mapped - maybe via rmap_add. This > > would obviously completely kill performance-wise, though. > > > > Another idea would be to improve the reclaim logic in a way that it > > prefers "old" PT_PAGE_TABLE_LEVEL over directories. Though I'm not sure > > how to code that up sensibly, either. > > > > As I said, this is proof-of-concept and RFC. So any comments welcome. > > For my use case the proof-of-concept diff seems to do well enough, > > though. > > Given that reclaim is fairly rare, we should try to move the cost > there. So how about this: > > - add an 'accessed' flag to struct kvm_mmu_page > - when reclaiming, try to evict pages that were not recently accessed > (but don't overscan - if you scan many recently accessed pages, evict > some of them anyway) - prefer page table level pages over directory level pages in the face of overscan. > - when scanning, update the accessed flag with the accessed bit of all > parent_ptes I might be misunderstanding, but I think it should be the other way 'round. i.e. a page is accessed if any of it's children have been accessed. > - when dropping an spte, update the accessed flag of the kvm_mmu_page it > points to > - when reloading cr3, mark the page as accessed (since it has no > parent_ptes) > > This should introduce some LRU-ness that depends not only on fault > behaviour but also on long-term guest access behaviour (which is > important for long-running processes and kernel pages). I'll try to come up with a patch for this, later tonight. Unless you already have something in the making. Thanks. -- /"\ Best regards, | mlaier@xxxxxxxxxxx \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html