> -----Original Message----- > From: owner-linux-mm@xxxxxxxxx <owner-linux-mm@xxxxxxxxx> On Behalf > Of Matthew Wilcox > Sent: Tuesday, August 20, 2019 3:21 PM > To: Nitin Gupta <nigupta@xxxxxxxxxx> > Cc: akpm@xxxxxxxxxxxxxxxxxxxx; vbabka@xxxxxxx; > mgorman@xxxxxxxxxxxxxxxxxxx; mhocko@xxxxxxxx; > dan.j.williams@xxxxxxxxx; Yu Zhao <yuzhao@xxxxxxxxxx>; Qian Cai > <cai@xxxxxx>; Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>; Roman > Gushchin <guro@xxxxxx>; Greg Kroah-Hartman > <gregkh@xxxxxxxxxxxxxxxxxxx>; Kees Cook <keescook@xxxxxxxxxxxx>; Jann > Horn <jannh@xxxxxxxxxx>; Johannes Weiner <hannes@xxxxxxxxxxx>; Arun > KS <arunks@xxxxxxxxxxxxxx>; Janne Huttunen > <janne.huttunen@xxxxxxxxx>; Konstantin Khlebnikov > <khlebnikov@xxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux- > mm@xxxxxxxxx > Subject: Re: [RFC] mm: Proactive compaction > > On Fri, Aug 16, 2019 at 02:43:30PM -0700, Nitin Gupta wrote: > > Testing done (on x86): > > - Set /sys/kernel/mm/compaction/order-9/extfrag_{low,high} = {25, 30} > > respectively. > > - Use a test program to fragment memory: the program allocates all > > memory and then for each 2M aligned section, frees 3/4 of base pages > > using munmap. > > - kcompactd0 detects fragmentation for order-9 > extfrag_high and > > starts compaction till extfrag < extfrag_low for order-9. > > Your test program is a good idea, but I worry it may produce unrealistically > optimistic outcomes. Page cache is readily reclaimable, so you're setting up > a situation where 2MB pages can once again be produced. > > How about this: > > One program which creates a file several times the size of memory (or > several files which total the same amount). Then read the file(s). Maybe by > mmap(), and just do nice easy sequential accesses. > > A second program which causes slab allocations. eg > > for (;;) { > for (i = 0; i < n * 1000 * 1000; i++) { > char fname[64]; > > sprintf(fname, "/tmp/missing.%d", i); > open(fname, O_RDWR); > } > } > > The first program should thrash the pagecache, causing pages to > continuously be allocated, reclaimed and freed. The second will create > millions of dentries, causing the slab allocator to allocate a lot of > order-0 pages which are harder to free. If you really want to make it work > hard, mix in opening some files whihc actually exist, preventing the pages > which contain those dentries from being evicted. > > This feels like it's simulating a more normal workload than your test. > What do you think? This combination of workloads for mixing movable and unmovable pages sounds good. I coded up these two and here's what I observed: - kernel: 5.3.0-rc5 + this patch, x86_64, 32G RAM. - Set extfrag_{low,high} = {25,30} for order-9 - Run pagecache and dentry thrash test programs as you described - for pagecache test: mmap and sequentially read 128G file on a 32G system. - for dentry test: set n=100. I created /tmp/missing.[0-10000] so these dentries stay allocated.. - Start linux kernel compile for further pagecache thrashing. With above workload fragmentation for order-9 stayed 80-90% which kept kcompactd0 working but it couldn't make progress due to unmovable pages from dentries. As expected, we keep hitting compaction_deferred() as compaction attempts fail. After a manual `echo 3 | /proc/sys/vm/drop_caches` and stopping dentry thrasher, kcompactd succeded in bringing extfrag below set thresholds. With unmovable pages spread across memory, there is little compaction can do. Maybe we should have a knob like 'compactness' (like swapiness) which defines how aggressive compaction can be. For high values, maybe allow freeing dentries too? This way hugepage sensitive applications can trade with higher I/O latencies. Thanks, Nitin