On 01/23/2015 08:47 AM, Ebru Akagunduz wrote:
This patch aims to improve THP collapse rates, by allowing THP collapse in the presence of read-only ptes, like those left in place by do_swap_page after a read fault.
An other examples? What about zero pages?
Currently THP can collapse 4kB pages into a THP when there are up to khugepaged_max_ptes_none pte_none ptes in a 2MB range. This patch applies the same limit for read-only ptes. The patch was tested with a test program that allocates 800MB of memory, writes to it, and then sleeps. I force the system to swap out all but 190MB of the program by touching other memory. Afterwards, the test program does a mix of reads and writes to its memory, and the memory gets swapped back in. Without the patch, only the memory that did not get swapped out remained in THPs, which corresponds to 24% of the memory of the program. The percentage did not increase over time. With this patch, after 5 minutes of waiting khugepaged had collapsed 55% of the program's memory back into THPs. Signed-off-by: Ebru Akagunduz <ebru.akagunduz@xxxxxxxxx> Reviewed-by: Rik van Riel <riel@xxxxxxxxxx>
Sounds like a good idea. Acked-by: Vlastimil Babka <vbabka@xxxxxxx> nits below:
--- I've written down test results: With the patch: After swapped out: cat /proc/pid/smaps: Anonymous: 100352 kB AnonHugePages: 98304 kB Swap: 699652 kB Fraction: 97,95 cat /proc/meminfo: AnonPages: 1763732 kB AnonHugePages: 1716224 kB Fraction: 97,30 After swapped in: In a few seconds: cat /proc/pid/smaps Anonymous: 800004 kB AnonHugePages: 235520 kB Swap: 0 kB Fraction: 29,43 cat /proc/meminfo: AnonPages: 2464336 kB AnonHugePages: 1853440 kB Fraction: 75,21 In five minutes: cat /proc/pid/smaps: Anonymous: 800004 kB AnonHugePages: 440320 kB Swap: 0 kB Fraction: 55,0 cat /proc/meminfo: AnonPages: 2464340 AnonHugePages: 2058240 Fraction: 83,52 Without the patch: After swapped out: cat /proc/pid/smaps: Anonymous: 190660 kB AnonHugePages: 190464 kB Swap: 609344 kB Fraction: 99,89 cat /proc/meminfo: AnonPages: 1740456 kB AnonHugePages: 1667072 kB Fraction: 95,78 After swapped in: cat /proc/pid/smaps: Anonymous: 800004 kB AnonHugePages: 190464 kB Swap: 0 kB Fraction: 23,80 cat /proc/meminfo: AnonPages: 2350032 kB AnonHugePages: 1667072 kB Fraction: 70,93 I waited 10 minutes the fractions did not change without the patch. mm/huge_memory.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 817a875..af750d9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2158,7 +2158,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, else goto out; } - if (!pte_present(pteval) || !pte_write(pteval)) + if (!pte_present(pteval)) goto out; page = vm_normal_page(vma, address, pteval); if (unlikely(!page)) @@ -2169,7 +2169,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageSwapBacked(page), page); /* cannot use mapcount: can't collapse if there's a gup pin */ - if (page_count(page) != 1) + if (page_count(page) != 1 + !!PageSwapCache(page))
Took me a while to grok this !!PageSwapCache(page) part. Perhaps expand the comment?
goto out; /* * We can do it before isolate_lru_page because the @@ -2179,6 +2179,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, */ if (!trylock_page(page)) goto out; + if (!pte_write(pteval)) { + if (PageSwapCache(page) && !reuse_swap_page(page)) { + unlock_page(page); + goto out;
Too much indent on the 2 lines above.
+ } + /* + * Page is not in the swap cache, and page count is + * one (see above). It can be collapsed into a THP. + */
Such comment sounds like a good place for: VM_BUG_ON(page_count(page) != 1));
+ } + /* * Isolate the page to avoid collapsing an hugepage * currently in use by the VM. @@ -2550,7 +2561,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, referenced = 0, none = 0; + int ret = 0, referenced = 0, none = 0, ro = 0; struct page *page; unsigned long _address; spinlock_t *ptl; @@ -2573,8 +2584,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, else goto out_unmap; } - if (!pte_present(pteval) || !pte_write(pteval)) + if (!pte_present(pteval)) goto out_unmap; + if (!pte_write(pteval)) { + if (++ro > khugepaged_max_ptes_none) + goto out_unmap; + } page = vm_normal_page(vma, _address, pteval); if (unlikely(!page)) goto out_unmap; @@ -2592,7 +2607,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, if (!PageLRU(page) || PageLocked(page) || !PageAnon(page)) goto out_unmap; /* cannot use mapcount: can't collapse if there's a gup pin */ - if (page_count(page) != 1) + if (page_count(page) != 1 + !!PageSwapCache(page))
Same as above. Even more so, as there's no other page swap cache handling code in this function.
Thanks.
goto out_unmap; if (pte_young(pteval) || PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, address))
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>