答复: [PATCH v4] mm/compaction: let proactive compaction order configurable

"Chu,Kaiping" <chukaiping@xxxxxxxxx> · Mon, 10 May 2021 02:10:46 +0000

-----邮件原件-----
发件人: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> 
发送时间: 2021年5月10日 8:18
收件人: Chu,Kaiping <chukaiping@xxxxxxxxx>
抄送: mcgrof@xxxxxxxxxx; keescook@xxxxxxxxxxxx; yzaikin@xxxxxxxxxx; vbabka@xxxxxxx; nigupta@xxxxxxxxxx; bhe@xxxxxxxxxx; khalid.aziz@xxxxxxxxxx; iamjoonsoo.kim@xxxxxxx; mateusznosek0@xxxxxxxxx; sh_def@xxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>; David Rientjes <rientjes@xxxxxxxxxx>
主题: Re: [PATCH v4] mm/compaction: let proactive compaction order configurable

On Wed, 28 Apr 2021 10:28:21 +0800 chukaiping <chukaiping@xxxxxxxxx> wrote:

> > Currently the proactive compaction order is fixed to 
> > COMPACTION_HPAGE_ORDER(9), it's OK in most machines with lots of 
> > normal 4KB memory, but it's too high for the machines with small 
> > normal memory, for example the machines with most memory configured as 
> > 1GB hugetlbfs huge pages. In these machines the max order of free 
> > pages is often below 9, and it's always below 9 even with hard 
> > compaction. This will lead to proactive compaction be triggered very 
> > frequently. In these machines we only care about order of 3 or 4.
> > This patch export the oder to proc and let it configurable by user, 
> > and the default value is still COMPACTION_HPAGE_ORDER.

> It would be great to do this automatically?  It's quite simple to see when memory is being handed out to hugetlbfs - so can we tune proactive_compaction_order in response to this?  That would be far better than adding a manual tunable.

> But from having read Khalid's comments, that does sound quite involved.
> Is there some partial solution that we can come up with that will get most people out of trouble?

> That being said, this patch is super-super-simple so perhaps we should just merge it just to get one person (and hopefully a few more) out of trouble.  But on the other hand, once we add a /proc tunable we must maintain that tunable for ever (or at least a very long time) even if the internal implementations change a lot.

Currently the fragment index of each zone is per order, there is no single fragment index for the whole system, so we can only use a user defined order for proactive compaction. I am keep thinking of the way to calculating the average fragment index of the system, but till now I doesn't think out it. I think that we can just use the proc file to configure the order manually, if we think out better solution in future, we can keep the proc file but remove the implementation internally.