On Fri, 23 Feb 2024 17:15:50 +1300 Barry Song <21cnbao@xxxxxxxxx> wrote: > From: Barry Song <v-songbaohua@xxxxxxxx> > > While doing MADV_PAGEOUT, the current code will clear PTE young > so that vmscan won't read young flags to allow the reclamation > of madvised folios to go ahead. > It seems we can do it by directly ignoring references, thus we > can remove tlb flush in madvise and rmap overhead in vmscan. > > Regarding the side effect, in the original code, if a parallel > thread runs side by side to access the madvised memory with the > thread doing madvise, folios will get a chance to be re-activated > by vmscan. But with the patch, they will still be reclaimed. But > this behaviour doing PAGEOUT and doing access at the same time is > quite silly like DoS. So probably, we don't need to care. I think we might need to take care of the case, since users may use just a best-effort estimation like DAMON for the target pages. In such cases, the page granularity re-check of the access could be helpful. So I concern if this could be a visible behavioral change for some valid use cases. > > A microbench as below has shown 6% decrement on the latency of > MADV_PAGEOUT, I assume some of the users may use MADV_PAGEOUT for proactive reclamation of the memory. In the use case, I think latency of MADV_PAGEOUT might be not that important. Hence I think the cons of the behavioral change might outweigh the pros of the latench improvement, for such best-effort proactive reclamation use case. Hope to hear and learn from others' opinions. > > #define PGSIZE 4096 > main() > { > int i; > #define SIZE 512*1024*1024 > volatile long *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > for (i = 0; i < SIZE/sizeof(long); i += PGSIZE / sizeof(long)) > p[i] = 0x11; > > madvise(p, SIZE, MADV_PAGEOUT); > } > > w/o patch w/ patch > root@10:~# time ./a.out root@10:~# time ./a.out > real 0m49.634s real 0m46.334s > user 0m0.637s user 0m0.648s > sys 0m47.434s sys 0m44.265s > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> Thanks, SJ [...]