Re: [PATCH] mm: incorporate read-only pages into transparent huge pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/23/2015 08:47 AM, Ebru Akagunduz wrote:
This patch aims to improve THP collapse rates, by allowing
THP collapse in the presence of read-only ptes, like those
left in place by do_swap_page after a read fault.

An other examples? What about zero pages?

Currently THP can collapse 4kB pages into a THP when
there are up to khugepaged_max_ptes_none pte_none ptes
in a 2MB range. This patch applies the same limit for
read-only ptes.

The patch was tested with a test program that allocates
800MB of memory, writes to it, and then sleeps. I force
the system to swap out all but 190MB of the program by
touching other memory. Afterwards, the test program does
a mix of reads and writes to its memory, and the memory
gets swapped back in.

Without the patch, only the memory that did not get
swapped out remained in THPs, which corresponds to 24% of
the memory of the program. The percentage did not increase
over time.

With this patch, after 5 minutes of waiting khugepaged had
collapsed 55% of the program's memory back into THPs.

Signed-off-by: Ebru Akagunduz <ebru.akagunduz@xxxxxxxxx>
Reviewed-by: Rik van Riel <riel@xxxxxxxxxx>

Sounds like a good idea.
Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
nits below:

---
I've written down test results:

With the patch:
After swapped out:
cat /proc/pid/smaps:
Anonymous:      100352 kB
AnonHugePages:  98304 kB
Swap:           699652 kB
Fraction:       97,95

cat /proc/meminfo:
AnonPages:      1763732 kB
AnonHugePages:  1716224 kB
Fraction:       97,30

After swapped in:
In a few seconds:
cat /proc/pid/smaps
Anonymous:      800004 kB
AnonHugePages:  235520 kB
Swap:           0 kB
Fraction:       29,43

cat /proc/meminfo:
AnonPages:      2464336 kB
AnonHugePages:  1853440 kB
Fraction:       75,21

In five minutes:
cat /proc/pid/smaps:
Anonymous:      800004 kB
AnonHugePages:  440320 kB
Swap:           0 kB
Fraction:       55,0

cat /proc/meminfo:
AnonPages:      2464340
AnonHugePages:  2058240
Fraction:       83,52

Without the patch:
After swapped out:
cat /proc/pid/smaps:
Anonymous:      190660 kB
AnonHugePages:  190464 kB
Swap:           609344 kB
Fraction:       99,89

cat /proc/meminfo:
AnonPages:      1740456 kB
AnonHugePages:  1667072 kB
Fraction:       95,78

After swapped in:
cat /proc/pid/smaps:
Anonymous:      800004 kB
AnonHugePages:  190464 kB
Swap:           0 kB
Fraction:       23,80

cat /proc/meminfo:
AnonPages:      2350032 kB
AnonHugePages:  1667072 kB
Fraction:       70,93

I waited 10 minutes the fractions
did not change without the patch.

  mm/huge_memory.c | 25 ++++++++++++++++++++-----
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 817a875..af750d9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2158,7 +2158,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
  			else
  				goto out;
  		}
-		if (!pte_present(pteval) || !pte_write(pteval))
+		if (!pte_present(pteval))
  			goto out;
  		page = vm_normal_page(vma, address, pteval);
  		if (unlikely(!page))
@@ -2169,7 +2169,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
  		VM_BUG_ON_PAGE(!PageSwapBacked(page), page);

  		/* cannot use mapcount: can't collapse if there's a gup pin */
-		if (page_count(page) != 1)
+		if (page_count(page) != 1 + !!PageSwapCache(page))

Took me a while to grok this !!PageSwapCache(page) part. Perhaps expand the comment?

  			goto out;
  		/*
  		 * We can do it before isolate_lru_page because the
@@ -2179,6 +2179,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
  		 */
  		if (!trylock_page(page))
  			goto out;
+		if (!pte_write(pteval)) {
+			if (PageSwapCache(page) && !reuse_swap_page(page)) {
+					unlock_page(page);
+					goto out;

Too much indent on the 2 lines above.

+			}
+			/*
+			 * Page is not in the swap cache, and page count is
+			 * one (see above). It can be collapsed into a THP.
+			 */

Such comment sounds like a good place for:

			VM_BUG_ON(page_count(page) != 1));

+		}
+
  		/*
  		 * Isolate the page to avoid collapsing an hugepage
  		 * currently in use by the VM.
@@ -2550,7 +2561,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
  {
  	pmd_t *pmd;
  	pte_t *pte, *_pte;
-	int ret = 0, referenced = 0, none = 0;
+	int ret = 0, referenced = 0, none = 0, ro = 0;
  	struct page *page;
  	unsigned long _address;
  	spinlock_t *ptl;
@@ -2573,8 +2584,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
  			else
  				goto out_unmap;
  		}
-		if (!pte_present(pteval) || !pte_write(pteval))
+		if (!pte_present(pteval))
  			goto out_unmap;
+		if (!pte_write(pteval)) {
+			if (++ro > khugepaged_max_ptes_none)
+				goto out_unmap;
+		}
  		page = vm_normal_page(vma, _address, pteval);
  		if (unlikely(!page))
  			goto out_unmap;
@@ -2592,7 +2607,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
  		if (!PageLRU(page) || PageLocked(page) || !PageAnon(page))
  			goto out_unmap;
  		/* cannot use mapcount: can't collapse if there's a gup pin */
-		if (page_count(page) != 1)
+		if (page_count(page) != 1 + !!PageSwapCache(page))

Same as above. Even more so, as there's no other page swap cache handling code in this function.

Thanks.

  			goto out_unmap;
  		if (pte_young(pteval) || PageReferenced(page) ||
  		    mmu_notifier_test_young(vma->vm_mm, address))


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]