On Thu, Feb 01, 2018 at 01:27:30PM +0300, Kirill A. Shutemov wrote: > > It's non-trivial to do this because at minimum a page fault has to check > > if there is a potential promotion candidate by checking the PTEs around > > the faulting address searching for a correctly-aligned base page that is > > already inserted. If there is, then check if the correctly aligned base > > page for the current faulting address is free and if so use it. It'll > > also then need to check the remaining PTEs to see if both the promotion > > threshold has been reached and if so, promote it to a THP (or else teach > > khugepaged to do an in-place promotion if possible). In other words, > > implementing the promotion threshold is both hard and it's not free. > > "not free" is understatement. > > Converting PTE page table to PMD would require down_write(mmap_sem). > Doing it from within page fault path would also mean that we need to drop > down_read(mmap) we hold, re-aquaire it with down_write(), find the vma again > and re-validate that nothing changed in meanwhile... > > That's an interesting exercise, but I'm skeptical it would result in anything > practical. > The details are painful but we're somewhat caught between a rock and a hard place for workloads that sparsely reference memory and want to avoid excessive memory usage. Given that the cost will be high, it may need to dynamically detect what the promotion threshold is -- default high and reduce it on a per-task basis if promotions are frequent. Either way, expecting applications to get it right with hints is the road to hell paved with good intentions. If they were able to get this right, they would be using prctl(PR_SET_THP_DISABLE) already. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>