Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> · Fri, 9 Nov 2018 15:13:18 +0300

On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote:
> The basic idea as outlined by Mel Gorman in [2] is:
> 
> 1) On first fault in a sufficiently sized range, allocate a huge page
>    sized and aligned block of base pages.  Map the base page
>    corresponding to the fault address and hold the rest of the pages in
>    reserve.
> 2) On subsequent faults in the range, map the pages from the reservation.
> 3) When enough pages have been mapped, promote the mapped pages and
>    remaining pages in the reservation to a huge page.
> 4) When there is memory pressure, release the unused pages from their
>    reservations.

I haven't yet read the patch in details, but I'm skeptical about the
approach in general for few reasons:

- PTE page table retracting to replace it with huge PMD entry requires
  down_write(mmap_sem). It makes the approach not practical for many
  multi-threaded workloads.

  I don't see a way to avoid exclusive lock here. I will be glad to
  be proved otherwise.

- The promotion will also require TLB flush which might be prohibitively
  slow on big machines.

- Short living processes will fail to benefit from THP with the policy,
  even with plenty of free memory in the system: no time to promote to THP
  or, with synchronous promotion, cost will overweight the benefit.

The goal to reduce memory overhead of THP is admirable, but we need to be
careful not to kill THP benefit itself. The approach will reduce number of
THP mapped in the system and/or shift their allocation to later stage of
process lifetime.

The only way I see it can be useful is if it will be possible to apply the
policy on per-VMA basis. It will be very useful for malloc()
implementations, for instance. But as a global policy it's no-go to me.

Prove me wrong with performance data. :)

-- 
 Kirill A. Shutemov