On 12/15/17 2:00 AM, Kirill A. Shutemov wrote: > On Thu, Dec 14, 2017 at 05:28:52PM -0800, Nitin Gupta wrote: >> Currently, if the THP enabled policy is "always", or the mode >> is "madvise" and a region is marked as MADV_HUGEPAGE, a hugepage >> is allocated on a page fault if the pud or pmd is empty. This >> yields the best VA translation performance, but increases memory >> consumption if some small page ranges within the huge page are >> never accessed. >> >> An alternate behavior for such page faults is to install a >> hugepage only when a region is actually found to be (almost) >> fully mapped and active. This is a compromise between >> translation performance and memory consumption. Currently there >> is no way for an application to choose this compromise for the >> page fault conditions above. >> >> With this change, when an application issues MADV_DONTNEED on a >> memory region, the region is marked as "space-efficient". For >> such regions, a hugepage is not immediately allocated on first >> write. Instead, it is left to the khugepaged thread to do >> delayed hugepage promotion depending on whether the region is >> actually mapped and active. When application issues >> MADV_HUGEPAGE, the region is marked again as non-space-efficient >> wherein hugepage is allocated on first touch. > > I think this would be NAK. At least in this form. > > What performance testing have you done? Any numbers? > I wrote a throw-away code which mmaps 128G area and writes to a random address in a loop. Together with writes, madvise(MADV_DONTNEED) are issued at another random addresses. Writes are issued with 70% probability and DONTNEED with 30%. With this test, I'm trying to emulate workload of a large in-memory hash-table. With the patch, I see that memory bloat is much less severe. I've uploaded the test program with the memory usage plot here: https://gist.github.com/nitingupta910/42ddf969e17556d74a14fbd84640ddb3 THP was set to 'always' mode in both cases but the result would be the same if madvise mode was used instead. > Making whole vma "space_efficient" just because somebody freed one page > from it is just wrong. And there's no way back after this. > I'm using MADV_DONTNEED as a hint that although user wants to transparently use hugepages but at the same time wants to be more conservative with respect to memory usage. If a MADV_HUGEPAGE is issued for a VMA range after any DONTNEEDs then the space_efficient bit is again cleared, so we revert back to allocating hugepage on fault on empty pud/pmd. >> >> Orabug: 26910556 > > Wat? > It's oracle internal identifier used to track this work. Thanks, Nitin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>