On 9/19/24 07:04, Barry Song wrote:
On Mon, Sep 16, 2024 at 11:08 PM Dev Jain <dev.jain@xxxxxxx> wrote:
For an mTHP allocation, we need to check, for every order, whether for
that order, we have enough number of contiguous PTEs empty. Instead of
iterating the while loop for every order, use some information, which
is the first set PTE found, from the previous iteration to eliminate
some cases. The key to understanding the correctness of the patch
is that the ranges we want to examine form a strictly decreasing
sequence of nested intervals.
Could we include some benchmark data here, as suggested by Ryan in this thread?
https://lore.kernel.org/linux-mm/58f91a56-890a-45d0-8b1f-47c4c70c9600@xxxxxxx/
Can you please verify and get some numbers for the following program,
because if I am doing this correctly, it would be a regression :)
https://www.codedump.xyz/cpp/Zuvf8FwvRPH21UO2
The program does this: disable THP completely -> mmap 1G VMA -> touch the last
page of a 32K sized boundary. That is, 0th till 32K/4K - 2 pages are
empty, while the 32K/4K - 1'th page is touched, and so on -> madvise
the entire VMA -> enable all THPs except 2M -> touch all pages.
Therefore, we have 0 - 6 PTEs empty, 7th is filled, and so on. Eventually,
kernel will fall down to finding 4 contiguous PTEs empty and allocate
4K * 4 = 16K mTHP.
The result without the patches:
real: 8.250s
user: 0.941s
sys: 7.077s
real: 8.175s
user: 0.939s
sys: 7.021s
With the patches:
real: 8.584s
user: 1.089s
sys: 7.234s
real: 8.429s
user: 0.954s
sys: 7.220s
You can change the #iterations in the for loop to magnify this,
and the current code surprisingly wins.
Suggested-by: Ryan Roberts <ryan.roberts@xxxxxxx>
Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
---
mm/memory.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 8bb1236de93c..e81c6abe09ce 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4633,10 +4633,11 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ pte_t *first_set_pte = NULL, *align_pte, *pte;
unsigned long orders;
struct folio *folio;
unsigned long addr;
- pte_t *pte;
+ int max_empty;
gfp_t gfp;
int order;
@@ -4671,8 +4672,23 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
order = highest_order(orders);
while (orders) {
addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
- if (pte_range_none(pte + pte_index(addr), 1 << order) == 1 << order)
+ align_pte = pte + pte_index(addr);
+
+ /* Range to be scanned known to be empty */
+ if (align_pte + (1 << order) <= first_set_pte)
+ break;
+
+ /* Range to be scanned contains first_set_pte */
+ if (align_pte <= first_set_pte)
+ goto repeat;
+
+ /* align_pte > first_set_pte, so need to check properly */
+ max_empty = pte_range_none(align_pte, 1 << order);
+ if (max_empty == 1 << order)
break;
+
+ first_set_pte = align_pte + max_empty;
+repeat:
order = next_order(&orders, order);
}
--
2.30.2
Thanks
barry