On Tue, 16 Jun 2020 13:45:27 -0700 Nitin Gupta <nigupta@xxxxxxxxxx> wrote: > For some applications, we need to allocate almost all memory as > hugepages. However, on a running system, higher-order allocations can > fail if the memory is fragmented. Linux kernel currently does on-demand > compaction as we request more hugepages, but this style of compaction > incurs very high latency. Experiments with one-time full memory > compaction (followed by hugepage allocations) show that kernel is able > to restore a highly fragmented memory state to a fairly compacted memory > state within <1 sec for a 32G system. Such data suggests that a more > proactive compaction can help us allocate a large fraction of memory as > hugepages keeping allocation latencies low. > > ... > All looks straightforward to me and easy to disable if it goes wrong. All the hard-coded magic numbers are a worry, but such is life. One teeny complaint: > > ... > > @@ -2650,12 +2801,34 @@ static int kcompactd(void *p) > unsigned long pflags; > > trace_mm_compaction_kcompactd_sleep(pgdat->node_id); > - wait_event_freezable(pgdat->kcompactd_wait, > - kcompactd_work_requested(pgdat)); > + if (wait_event_freezable_timeout(pgdat->kcompactd_wait, > + kcompactd_work_requested(pgdat), > + msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC))) { > + > + psi_memstall_enter(&pflags); > + kcompactd_do_work(pgdat); > + psi_memstall_leave(&pflags); > + continue; > + } > > - psi_memstall_enter(&pflags); > - kcompactd_do_work(pgdat); > - psi_memstall_leave(&pflags); > + /* kcompactd wait timeout */ > + if (should_proactive_compact_node(pgdat)) { > + unsigned int prev_score, score; Everywhere else, scores have type `int'. Here they are unsigned. How come? Would it be better to make these unsigned throughout? I don't think a score can ever be negative? > + if (proactive_defer) { > + proactive_defer--; > + continue; > + } > + prev_score = fragmentation_score_node(pgdat); > + proactive_compact_node(pgdat); > + score = fragmentation_score_node(pgdat); > + /* > + * Defer proactive compaction if the fragmentation > + * score did not go down i.e. no progress made. > + */ > + proactive_defer = score < prev_score ? > + 0 : 1 << COMPACT_MAX_DEFER_SHIFT; > + } > }