Re: RFC: shadow page table reclaim

Max Laier <max@xxxxxxxxxx> · Mon, 31 Aug 2009 14:09:09 +0200

On Monday 31 August 2009 11:55:24 Avi Kivity wrote:
> On 08/28/2009 05:31 AM, Max Laier wrote:
> > Hello,
> >
> > it seems to me that the reclaim mechanism for shadow page table pages is
> > sub- optimal.  The arch.active_mmu_pages list that is used for reclaiming
> > does not move up parent shadow page tables when a child is added so when
> > we need a new shadow page we zap the oldest - which can well be a
> > directory level page holding a just added table level page.
> >
> > Attached is a proof-of-concept diff and two plots before and after.  The
> > plots show referenced guest pages over time.
>
> What do you mean by referenced guest pages?  Total number of populated
> sptes?

Yes.

> > As you can see there is less saw-
> > toothing in the after plot and also fewer changes overall (because we
> > don't zap mappings that are still in use as often).  This is with a limit
> > of 64 for the shadow page table to increase the effect and vmx/ept.
> >
> > I realize that the list_move and parent walk are quite expensive and that
> > kvm_mmu_alloc_page is only half the story.  It should really be done
> > every time a new guest page table is mapped - maybe via rmap_add.  This
> > would obviously completely kill performance-wise, though.
> >
> > Another idea would be to improve the reclaim logic in a way that it
> > prefers "old" PT_PAGE_TABLE_LEVEL over directories.  Though I'm not sure
> > how to code that up sensibly, either.
> >
> > As I said, this is proof-of-concept and RFC.  So any comments welcome. 
> > For my use case the proof-of-concept diff seems to do well enough,
> > though.
>
> Given that reclaim is fairly rare, we should try to move the cost
> there.  So how about this:
>
> - add an 'accessed' flag to struct kvm_mmu_page
> - when reclaiming, try to evict pages that were not recently accessed
> (but don't overscan - if you scan many recently accessed pages, evict
> some of them anyway)

- prefer page table level pages over directory level pages in the face of 
overscan.

> - when scanning, update the accessed flag with the accessed bit of all
> parent_ptes

I might be misunderstanding, but I think it should be the other way 'round.  
i.e. a page is accessed if any of it's children have been accessed.

> - when dropping an spte, update the accessed flag of the kvm_mmu_page it
> points to
> - when reloading cr3, mark the page as accessed (since it has no
> parent_ptes)
>
> This should introduce some LRU-ness that depends not only on fault
> behaviour but also on long-term guest access behaviour (which is
> important for long-running processes and kernel pages).

I'll try to come up with a patch for this, later tonight.  Unless you already 
have something in the making.  Thanks.

-- 
/"\  Best regards,                      | mlaier@xxxxxxxxxxx
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html