Re: [RFC PATCH 2/3] Add tunable to control THP behavior

Alex Thorlton <athorlton@xxxxxxx> · Thu, 12 Dec 2013 17:17:30 -0600

On Thu, Dec 12, 2013 at 04:37:11PM -0500, Rik van Riel wrote:
> On 12/12/2013 01:00 PM, Alex Thorlton wrote:
> >This part of the patch adds a tunable to
> >/sys/kernel/mm/transparent_hugepage called threshold.  This threshold
> >determines how many pages a user must fault in from a single node before
> >a temporary compound page is turned into a THP.
> 
> >+++ b/mm/huge_memory.c
> >@@ -44,6 +44,9 @@ unsigned long transparent_hugepage_flags __read_mostly =
> >  	(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
> >  	(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
> >
> >+/* default to 1 page threshold for handing out thps; maintains old behavior */
> >+static int transparent_hugepage_threshold = 1;
> 
> I assume the motivation for writing all this code is that "1"
> was not a good value in your tests.

Yes, that's correct.

> That makes me wonder, why should 1 be the default value with
> your patches?

The main reason I set the default to 1 was because the majority of
jobs aren't hurt by the existing THP behavior.  I figured it would be
best to default to having things behave the same as they do now, but
provide the option to increase the threshold on systems that run jobs
that could be adversely affected by the current behavior.

> If there is a better value, why should we not use that?
>
> What is the upside of using a better value?
>
> What is the downside?

The problem here is that what the "better" value is can vary greatly
depending on how a particular task allocates memory.  Setting the
threshold too high can negatively affect the performance of jobs that
behave well with the current behavior, setting it too low won't yield a
performance increase for the jobs that are hurt by the current
behavior.  With some more thorough testing, I'm sure that we could
arrive at a value that will help out jobs which behave poorly under
current conditions, while having a minimal effect on jobs that already
perform well.  At this point, I'm looking more to ensure that everybody
likes this approach to solving the problem before putting the finishing
touches on the patches, and doing testing to find a good middle ground.

> Is there a value that would to bound the downside, so it
> is almost always smaller than the upside?

Again, the problem here is that, to find a good value, we have to know
quite a bit about why a particular value is bad for a particular job.
While, as stated above, I think we can probably find a good middle
ground to use as a default, in the end it will be the job of individual
sysadmins to determine what value works best for their particular
applications, and tune things accordingly.

- Alex

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>