Re: [PATCH v8 00/10] Multi-size THP for anonymous memory

Ryan Roberts <ryan.roberts@xxxxxxx> · Tue, 5 Dec 2023 11:13:18 +0000

On 05/12/2023 03:37, John Hubbard wrote:
> On 12/4/23 02:20, Ryan Roberts wrote:
>> Hi All,
>>
>> A new week, a new version, a new name... This is v8 of a series to implement
>> multi-size THP (mTHP) for anonymous memory (previously called "small-sized THP"
>> and "large anonymous folios"). Matthew objected to "small huge" so hopefully
>> this fares better.
>>
>> The objective of this is to improve performance by allocating larger chunks of
>> memory during anonymous page faults:
>>
>> 1) Since SW (the kernel) is dealing with larger chunks of memory than base
>>     pages, there are efficiency savings to be had; fewer page faults, batched PTE
>>     and RMAP manipulation, reduced lru list, etc. In short, we reduce kernel
>>     overhead. This should benefit all architectures.
>> 2) Since we are now mapping physically contiguous chunks of memory, we can take
>>     advantage of HW TLB compression techniques. A reduction in TLB pressure
>>     speeds up kernel and user space. arm64 systems have 2 mechanisms to coalesce
>>     TLB entries; "the contiguous bit" (architectural) and HPA (uarch).
>>
>> This version changes the name and tidies up some of the kernel code and test
>> code, based on feedback against v7 (see change log for details).
> 
> Using a couple of Armv8 systems, I've tested this patchset. I applied it
> to top of tree (Linux 6.7-rc4), on top of your latest contig pte series
> [1].
> 
> With those two patchsets applied, the mm selftests look OK--or at least
> as OK as they normally do. I compared test runs between THP/mTHP set to
> "always", vs "never", to verify that there were no new test failures.
> Details: specifically, I set one particular page size (2 MB) to
> "inherit", and then toggled /sys/kernel/mm/transparent_hugepage/enabled
> between "always" and "never".

Excellent - I'm guessing this was for 64K base pages?

> 
> I also re-ran my usual compute/AI benchmark, and I'm still seeing the
> same 10x performance improvement that I reported for the v6 patchset.
> 
> So for this patchset and for [1] as well, please feel free to add:
> 
> Tested-by: John Hubbard <jhubbard@xxxxxxxxxx>

Thanks!

> 
> 
> [1] https://lore.kernel.org/all/20231204105440.61448-1-ryan.roberts@xxxxxxx/
> 
> 
> thanks,