> -----Original Message----- > From: owner-linux-mm@xxxxxxxxx <owner-linux-mm@xxxxxxxxx> On Behalf > Of David Rientjes > Sent: Monday, September 16, 2019 1:17 PM > To: Nitin Gupta <nigupta@xxxxxxxxxx> > Cc: akpm@xxxxxxxxxxxxxxxxxxxx; vbabka@xxxxxxx; > mgorman@xxxxxxxxxxxxxxxxxxx; mhocko@xxxxxxxx; > dan.j.williams@xxxxxxxxx; Yu Zhao <yuzhao@xxxxxxxxxx>; Matthew Wilcox > <willy@xxxxxxxxxxxxx>; Qian Cai <cai@xxxxxx>; Andrey Ryabinin > <aryabinin@xxxxxxxxxxxxx>; Roman Gushchin <guro@xxxxxx>; Greg Kroah- > Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>; Kees Cook > <keescook@xxxxxxxxxxxx>; Jann Horn <jannh@xxxxxxxxxx>; Johannes > Weiner <hannes@xxxxxxxxxxx>; Arun KS <arunks@xxxxxxxxxxxxxx>; Janne > Huttunen <janne.huttunen@xxxxxxxxx>; Konstantin Khlebnikov > <khlebnikov@xxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; linux- > mm@xxxxxxxxx > Subject: Re: [RFC] mm: Proactive compaction > > On Fri, 16 Aug 2019, Nitin Gupta wrote: > > > For some applications we need to allocate almost all memory as > > hugepages. However, on a running system, higher order allocations can > > fail if the memory is fragmented. Linux kernel currently does > > on-demand compaction as we request more hugepages but this style of > > compaction incurs very high latency. Experiments with one-time full > > memory compaction (followed by hugepage allocations) shows that kernel > > is able to restore a highly fragmented memory state to a fairly > > compacted memory state within <1 sec for a 32G system. Such data > > suggests that a more proactive compaction can help us allocate a large > > fraction of memory as hugepages keeping allocation latencies low. > > > > For a more proactive compaction, the approach taken here is to define > > per page-order external fragmentation thresholds and let kcompactd > > threads act on these thresholds. > > > > The low and high thresholds are defined per page-order and exposed > > through sysfs: > > > > /sys/kernel/mm/compaction/order-[1..MAX_ORDER]/extfrag_{low,high} > > > > Per-node kcompactd thread is woken up every few seconds to check if > > any zone on its node has extfrag above the extfrag_high threshold for > > any order, in which case the thread starts compaction in the backgrond > > till all zones are below extfrag_low level for all orders. By default > > both these thresolds are set to 100 for all orders which essentially > > disables kcompactd. > > > > To avoid wasting CPU cycles when compaction cannot help, such as when > > memory is full, we check both, extfrag > extfrag_high and > > compaction_suitable(zone). This allows kcomapctd thread to stays > > inactive even if extfrag thresholds are not met. > > > > This patch is largely based on ideas from Michal Hocko posted here: > > https://lore.kernel.org/linux- > mm/20161230131412.GI13301@xxxxxxxxxxxxxx > > / > > > > Testing done (on x86): > > - Set /sys/kernel/mm/compaction/order-9/extfrag_{low,high} = {25, 30} > > respectively. > > - Use a test program to fragment memory: the program allocates all > > memory and then for each 2M aligned section, frees 3/4 of base pages > > using munmap. > > - kcompactd0 detects fragmentation for order-9 > extfrag_high and > > starts compaction till extfrag < extfrag_low for order-9. > > > > The patch has plenty of rough edges but posting it early to see if I'm > > going in the right direction and to get some early feedback. > > > > Is there an update to this proposal or non-RFC patch that has been posted > for proactive compaction? > I recently posted a non-RFC patch for proactive compaction: https://lkml.org/lkml/2019/11/15/1099 Please let me know if you try it out or if you have any feedback. Thanks, Nitin > We've had good success with periodically compacting memory on a regular > cadence on systems with hugepages enabled. The cadence itself is defined > by the admin but it causes khugepaged[*] to periodically wakeup and invoke > compaction in an attempt to keep zones as defragmented as possible > (perhaps more "proactive" than what is proposed here in an attempt to keep > all memory as unfragmented as possible regardless of extfrag thresholds). > It also avoids corner-cases where kcompactd could become more expensive > than what is anticipated because it is unsuccessful at compacting memory yet > the extfrag threshold is still exceeded. > > [*] Khugepaged instead of kcompactd only because this is only enabled > for systems where transparent hugepages are enabled, probably better > off in kcompactd to avoid duplicating work between two kthreads if > there is already a need for background compaction.