On Fri, Sep 23, 2022 at 12:25:00PM -0700, Jim Mattson wrote: > On Fri, Sep 23, 2022 at 3:16 AM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > > > Because of this, when the guest clears the accessed bit in its nested EPT entries, KVM doesn't > > notice/intercept it and corresponding EPT sptes remain the same, thus later the guest access to > > the memory is not intercepted and because of this doesn't turn back > > the accessed bit in the guest EPT tables. > > Does the guest execute an INVEPT after clearing the accessed bit? No, that's the problem. In L1, access_tracking_perf_test is using page_idle to mark guest memory as idle, which results in clear_young() notifiers being sent to KVM clear access bits. clear_young() is explicitly allowed to omit flushes, so KVM happily obliges. /* * clear_young is a lightweight version of clear_flush_young. Like the * latter, it is supposed to test-and-clear the young/accessed bitflag * in the secondary pte, but it may omit flushing the secondary tlb. */ int (*clear_young)(struct mmu_notifier *subscription, struct mm_struct *mm, unsigned long start, unsigned long end); We could modify page_idle so that KVM performs TLB flushes. For example, add a mechanism for userspace to trigger a TLB flush. Or change page_idle to use clear_flush_young() (although that would be incredibly expensive since page_idle only allows clearing one pfn at a time). But I'm not sure creating a new userspace API just for this test is really worth it, especially with multigen LRU coming soon.