On Thu, Dec 12, 2013 at 04:37:11PM -0500, Rik van Riel wrote: > On 12/12/2013 01:00 PM, Alex Thorlton wrote: > >This part of the patch adds a tunable to > >/sys/kernel/mm/transparent_hugepage called threshold. This threshold > >determines how many pages a user must fault in from a single node before > >a temporary compound page is turned into a THP. > > >+++ b/mm/huge_memory.c > >@@ -44,6 +44,9 @@ unsigned long transparent_hugepage_flags __read_mostly = > > (1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)| > > (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG); > > > >+/* default to 1 page threshold for handing out thps; maintains old behavior */ > >+static int transparent_hugepage_threshold = 1; > > I assume the motivation for writing all this code is that "1" > was not a good value in your tests. Yes, that's correct. > That makes me wonder, why should 1 be the default value with > your patches? The main reason I set the default to 1 was because the majority of jobs aren't hurt by the existing THP behavior. I figured it would be best to default to having things behave the same as they do now, but provide the option to increase the threshold on systems that run jobs that could be adversely affected by the current behavior. > If there is a better value, why should we not use that? > > What is the upside of using a better value? > > What is the downside? The problem here is that what the "better" value is can vary greatly depending on how a particular task allocates memory. Setting the threshold too high can negatively affect the performance of jobs that behave well with the current behavior, setting it too low won't yield a performance increase for the jobs that are hurt by the current behavior. With some more thorough testing, I'm sure that we could arrive at a value that will help out jobs which behave poorly under current conditions, while having a minimal effect on jobs that already perform well. At this point, I'm looking more to ensure that everybody likes this approach to solving the problem before putting the finishing touches on the patches, and doing testing to find a good middle ground. > Is there a value that would to bound the downside, so it > is almost always smaller than the upside? Again, the problem here is that, to find a good value, we have to know quite a bit about why a particular value is bad for a particular job. While, as stated above, I think we can probably find a good middle ground to use as a default, in the end it will be the job of individual sysadmins to determine what value works best for their particular applications, and tune things accordingly. - Alex -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>