Currently KVM pretends that pages with EPT mappings never got accessed. This has some side effects in the VM, like swapping out actively used guest pages and needlessly breaking up actively used hugepages. We can avoid those very costly side effects by emulating the accessed bit for EPT PTEs, which should only be slightly costly because pages pass through page_referenced infrequently. TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young(). This seems to help prevent KVM guests from being swapped out when they should not on my system. Signed-off-by: Rik van Riel <riel@xxxxxxxxxx> --- Jeff, does this patch fix the issue you saw a few months ago, with a 256MB KVM guest in a cgroup limited to 128GB memory? arch/x86/kvm/mmu.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 89a49fb..6101615 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -856,9 +856,15 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, u64 *spte; int young = 0; - /* always return old for EPT */ + /* + * Emulate the accessed bit for EPT, by checking if this page has + * an EPT mapping, and clearing it if it does. On the next access, + * a new EPT mapping will be established. + * This has some overhead, but not as much as the cost of swapping + * out actively used pages or breaking up actively used hugepages. + */ if (!shadow_accessed_mask) - return 0; + return kvm_unmap_rmapp(kvm, rmapp, data); spte = rmap_next(kvm, rmapp, NULL); while (spte) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html