On Mon, Jun 6, 2022 at 3:25 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > > On Wed, May 18, 2022 at 4:49 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote: ... > > @@ -821,6 +822,12 @@ static bool folio_referenced_one(struct folio *folio, > > } > > > > if (pvmw.pte) { > > + if (lru_gen_enabled() && pte_young(*pvmw.pte) && > > + !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))) { > > + lru_gen_look_around(&pvmw); > > + referenced++; > > + } > > + > > if (ptep_clear_flush_young_notify(vma, address, > > Hello, Yu. > look_around() is calling ptep_test_and_clear_young(pvmw->vma, addr, pte + i) > only without flush and notify. for flush, there is a tlb operation for arm64: > static inline int ptep_clear_flush_young(struct vm_area_struct *vma, > unsigned long address, pte_t *ptep) > { > int young = ptep_test_and_clear_young(vma, address, ptep); > > if (young) { > /* > * We can elide the trailing DSB here since the worst that can > * happen is that a CPU continues to use the young entry in its > * TLB and we mistakenly reclaim the associated page. The > * window for such an event is bounded by the next > * context-switch, which provides a DSB to complete the TLB > * invalidation. > */ > flush_tlb_page_nosync(vma, address); > } > > return young; > } > > Does it mean the current kernel is over cautious? Hi Barry, This is up to individual archs. For x86, ptep_clear_flush_young() is ptep_test_and_clear_young(). For arm64, I'd say yes, based on Figure 1 of Navarro, Juan, et al. "Practical, transparent operating system support for superpages." [1]. int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { /* * On x86 CPUs, clearing the accessed bit without a TLB flush * doesn't cause data corruption. [ It could cause incorrect * page aging and the (mistaken) reclaim of hot pages, but the * chance of that should be relatively low. ] * * So as a performance optimization don't flush the TLB when * clearing the accessed bit, it will eventually be flushed by * a context switch or a VM operation anyway. [ In the rare * event of it not getting flushed for a long time the delay * shouldn't really matter because there's no real memory * pressure for swapout to react to. ] */ return ptep_test_and_clear_young(vma, address, ptep); } [1] https://www.usenix.org/legacy/events/osdi02/tech/full_papers/navarro/navarro.pdf > is it > safe to call ptep_test_and_clear_young() only? Yes. Though the h/w A-bit is designed to allow OSes to skip TLB flushes when unmapping, the Linux kernel doesn't do this. > btw, lru_gen_look_around() has already included 'address', are we doing > pte check for 'address' twice here? Yes for host MMU but no KVM MMU. ptep_clear_flush_young_notify() goes into the MMU notifier. We don't use the _notify variant in lru_gen_look_around() because GPA space generally exhibits no memory locality. Thanks.