On Wed, Jun 12, 2024 at 10:02 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Jun 11, 2024, James Houghton wrote: > > diff --git a/mm/rmap.c b/mm/rmap.c > > index e8fc5ecb59b2..24a3ff639919 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -870,13 +870,10 @@ static bool folio_referenced_one(struct folio *folio, > > continue; > > } > > > > - if (pvmw.pte) { > > - if (lru_gen_enabled() && > > - pte_young(ptep_get(pvmw.pte))) { > > - lru_gen_look_around(&pvmw); > > + if (lru_gen_enabled() && pvmw.pte) { > > + if (lru_gen_look_around(&pvmw)) > > referenced++; > > - } > > - > > + } else if (pvmw.pte) { > > if (ptep_clear_flush_young_notify(vma, address, > > pvmw.pte)) > > referenced++; > > Random question not really related to KVM/secondary MMU participation. AFAICT, > the MGLRU approach doesn't flush TLBs after aging pages. How does MGLRU mitigate > false negatives on pxx_young() due to the CPU not setting Accessed bits because > of stale TLB entries? I do think there can be false negatives but we have not been able to measure their practical impacts since we disabled the flush on some host MMUs long ago (NOT by MGLRU), e.g., on x86 and ppc, ptep_clear_flush_young() is just ptep_test_andclear_young(). The theoretical basis is that, given the TLB coverage trend (Figure 1 in [1]), when a system is running out of memory, it's unlikely to have many long-lived entries in its TLB. IOW, if that system had a stable working set (hot memory) that can fit into its TLB, it wouldn't hit page reclaim. Again, this is based on the theory (proposition) that for most systems, their TLB coverages are much smaller than their memory sizes. If/when the above proposition doesn't hold, the next step in the page reclaim path, which is to unmap the PTE, will cause a page fault. The fault can be minor or major (requires IO), depending on the race between the reclaiming and accessing threads. In this case, the tradeoff, in a steady state, is between the PF cost of pages we shouldn't reclaim and the flush cost of pages we scan. The PF cost is higher than the flush cost per page. But we scan many pages and only reclaim a few of them; pages we shouldn't reclaim are a (small) portion of the latter. [1] https://www.usenix.org/legacy/events/osdi02/tech/full_papers/navarro/navarro.pdf