On Fri, Dec 15, 2023 at 3:08 AM Prasad, Aravinda <aravinda.prasad@xxxxxxxxx> wrote: > > > On Fri, Dec 15, 2023 at 12:42 AM Aravinda Prasad > > <aravinda.prasad@xxxxxxxxx> wrote: > > ... > > > > > This patch proposes profiling different levels of the application’s > > > page table tree to detect whether a region is accessed or not. This > > > patch is based on the observation that, when the accessed bit for a > > > page is set, the accessed bits at the higher levels of the page table > > > tree (PMD/PUD/PGD) corresponding to the path of the page table walk > > > are also set. Hence, it is efficient to check the accessed bits at > > > the higher levels of the page table tree to detect whether a region is > > > accessed or not. > > > > This patch can crash on Xen. See commit 4aaf269c768d("mm: introduce > > arch_has_hw_nonleaf_pmd_young()") > > Will fix as suggested in the commit. > > > > > MGLRU already does this in the correct way. See mm/vmscan.c. > > I don't see access bits at PUD or PGD checked for 4K page size. Can you > point me to the code where access bits are checked at PUD and PGD level? There isn't any, because *the system* bottlenecks at the PTE level and at moving memory between tiers. Optimizing at the PUD/PGD levels has insignificant ROI for the system. And food for thought: 1. Can a PUD/PGD cover memory from different tiers? 2. Can the A-bit in non-leaf entries work for EPT? > > This patch also can cause USER DATA CORRUPTION. See commit > > c11d34fa139e ("mm/damon/ops-common: atomically test and clear young > > on ptes and pmds"). > > Ok. Will atomically test and set the access bits. > > > > > The quality of your patch makes me very much doubt the quality of your > > paper, especially your results on Google's kstaled and MGLRU in table 6.2. > > The results are very much reproducible. We have not used kstaled/MGLRU for > the data in Figure 3, but we linearly scan pages similar to kstaled by implementing > a kernel thread for scanning. You have not used MGLRU, and yet your results are very much reproducible. > Our argument for kstaled/MGLRU is that, scanning individual pages at 4K > granularity may not be efficient for large footprint applications. Your argument for MGLRU is based on a wrong assumption, as I have already pointed out. > Instead, > access bits at the higher level of the page table tree can be used. In the > paper we have demonstrated this with DAMON but the concept can be > applied to kstaled/MGLRU as well. You got it backward: MGLRU introduced the concept; you fabricated a comparison table.