> On Aug 15, 2022, at 12:13 AM, Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > Searching the rmap for PTEs mapping each page on an LRU list (to test > and clear the accessed bit) can be expensive because pages from > different VMAs (PA space) are not cache friendly to the rmap (VA > space). For workloads mostly using mapped pages, searching the rmap > can incur the highest CPU cost in the reclaim path. Impressive work. Sorry if my feedback is not timely. Just one minor point for thought, that can be left for a later cleanup. > > + for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) { > + unsigned long pfn; > + > + pfn = get_pte_pfn(pte[i], pvmw->vma, addr); > + if (pfn == -1) > + continue; > + > + if (!pte_young(pte[i])) > + continue; > + > + folio = get_pfn_folio(pfn, memcg, pgdat); > + if (!folio) > + continue; > + > + if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i)) > + continue; > + You have already checked that the PTE is old (not young), so this check seems redundant. I do not see a way in which the access-bit can be cleared since you hold the ptl. IOW, there is no need for the “if" and “continue". Makes me also wonder whether having a separate ptep_clear_young() can slightly help, since anyhow the access-bit is more of an estimation, and having a separate ptep_clear_young() can enable optimizations. On x86, for instance, if the PTE is dirty, we may be able to clear the access-bit without an atomic operation, which should be faster.