On Tue, 2019-08-20 at 10:46 +0200, Vlastimil Babka wrote: > > This patch is largely based on ideas from Michal Hocko posted here: > > https://lore.kernel.org/linux-mm/20161230131412.GI13301@xxxxxxxxxxxxxx/ > > > > Testing done (on x86): > > - Set /sys/kernel/mm/compaction/order-9/extfrag_{low,high} = {25, 30} > > respectively. > > - Use a test program to fragment memory: the program allocates all > > memory > > and then for each 2M aligned section, frees 3/4 of base pages using > > munmap. > > - kcompactd0 detects fragmentation for order-9 > extfrag_high and starts > > compaction till extfrag < extfrag_low for order-9. > > > > The patch has plenty of rough edges but posting it early to see if I'm > > going in the right direction and to get some early feedback. > > That's a lot of control knobs - how is an admin supposed to tune them to > their > needs? Yes, it's difficult for an admin to get so many tunable right unless targeting a very specific workload. How about a simpler solution where we exposed just one tunable per-node: /sys/.../node-x/compaction_effort which accepts [0, 100] This parallels /proc/sys/vm/swappiness but for compaction. With this single number, we can estimate per-order [low, high] watermarks for external fragmentation like this: - For now, map this range to [low, medium, high] which correponds to specific low, high thresholds for extfrag. - Apply more relaxed thresholds for higher-order than for lower orders. With this single tunable we remove the burden of setting per-order explicit [low, high] thresholds and it should be easier to experiment with. -Nitin