+ mm-free-swp_entry-in-madvise_free.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm/madvise.c: free swp_entry in madvise_free
has been added to the -mm tree.  Its filename is
     mm-free-swp_entry-in-madvise_free.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-free-swp_entry-in-madvise_free.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-free-swp_entry-in-madvise_free.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Minchan Kim <minchan@xxxxxxxxxx>
Subject: mm/madvise.c: free swp_entry in madvise_free

When I test below piece of code with 12 processes(ie, 512M * 12 = 6G
consume) on my (3G ram + 12 cpu + 8G swap, the madvise_free is siginficat
slower (ie, 2x times) than madvise_dontneed.

loop = 5;
mmap(512M);
while (loop--) {
        memset(512M);
        madvise(MADV_FREE or MADV_DONTNEED);
}

The reason is lots of swapin.

1) dontneed: 1,612 swapin
2) madvfree: 879,585 swapin

If we find hinted pages were already swapped out when syscall is called,
it's pointless to keep the swapped-out pages in pte.  Instead, let's free
the cold page because swapin is more expensive than (alloc page +
zeroing).

With this patch, it reduced swapin from 879,585 to 1,878 so elapsed time

1) dontneed: 6.10user 233.50system 0:50.44elapsed
2) madvfree: 6.03user 401.17system 1:30.67elapsed
2) madvfree + below patch: 6.70user 339.14system 1:04.45elapsed

Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
Acked-by: Michal Hocko <mhocko@xxxxxxxx>
Acked-by: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: "James E.J. Bottomley" <jejb@xxxxxxxxxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx>
Cc: "Shaohua Li" <shli@xxxxxxxxxx>
Cc: <yalin.wang2010@xxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Chen Gang <gang.chen.5i5j@xxxxxxxxx>
Cc: Chris Zankel <chris@xxxxxxxxxx>
Cc: Daniel Micay <danielmicay@xxxxxxxxx>
Cc: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
Cc: David S. Miller <davem@xxxxxxxxxxxxx>
Cc: Helge Deller <deller@xxxxxx>
Cc: Ivan Kokshaysky <ink@xxxxxxxxxxxxxxxxxxxx>
Cc: Jason Evans <je@xxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx>
Cc: Matt Turner <mattst88@xxxxxxxxx>
Cc: Max Filippov <jcmvbkbc@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Michael Kerrisk <mtk.manpages@xxxxxxxxx>
Cc: Mika Penttil <mika.penttila@xxxxxxxxxxxx>
Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
Cc: Richard Henderson <rth@xxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Roland Dreier <roland@xxxxxxxxxx>
Cc: Russell King <rmk@xxxxxxxxxxxxxxxx>
Cc: Shaohua Li <shli@xxxxxxxxxx>
Cc: Will Deacon <will.deacon@xxxxxxx>
Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/madvise.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff -puN mm/madvise.c~mm-free-swp_entry-in-madvise_free mm/madvise.c
--- a/mm/madvise.c~mm-free-swp_entry-in-madvise_free
+++ a/mm/madvise.c
@@ -270,6 +270,7 @@ static int madvise_free_pte_range(pmd_t
 	spinlock_t *ptl;
 	pte_t *orig_pte, *pte, ptent;
 	struct page *page;
+	int nr_swap = 0;
 
 	split_huge_pmd(vma, pmd, addr);
 	if (pmd_trans_unstable(pmd))
@@ -280,8 +281,24 @@ static int madvise_free_pte_range(pmd_t
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		ptent = *pte;
 
-		if (!pte_present(ptent))
+		if (pte_none(ptent))
 			continue;
+		/*
+		 * If the pte has swp_entry, just clear page table to
+		 * prevent swap-in which is more expensive rather than
+		 * (page allocation + zeroing).
+		 */
+		if (!pte_present(ptent)) {
+			swp_entry_t entry;
+
+			entry = pte_to_swp_entry(ptent);
+			if (non_swap_entry(entry))
+				continue;
+			nr_swap--;
+			free_swap_and_cache(entry);
+			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			continue;
+		}
 
 		page = vm_normal_page(vma, addr, ptent);
 		if (!page)
@@ -355,6 +372,12 @@ static int madvise_free_pte_range(pmd_t
 		}
 	}
 out:
+	if (nr_swap) {
+		if (current->mm == mm)
+			sync_mm_rss(mm);
+
+		add_mm_counter(mm, MM_SWAPENTS, nr_swap);
+	}
 	arch_leave_lazy_mmu_mode();
 	pte_unmap_unlock(orig_pte, ptl);
 	cond_resched();
_

Patches currently in -mm which might be from minchan@xxxxxxxxxx are

zram-pass-gfp-from-zcomp-frontend-to-backend.patch
mm-support-madvisemadv_free.patch
mm-support-madvisemadv_free-fix.patch
mm-define-madv_free-for-some-arches.patch
mm-free-swp_entry-in-madvise_free.patch
mm-move-lazily-freed-pages-to-inactive-list.patch
mm-mark-stable-page-dirty-in-ksm.patch
x86-add-pmd_-for-thp.patch
sparc-add-pmd_-for-thp.patch
powerpc-add-pmd_-for-thp.patch
arm-add-pmd_mkclean-for-thp.patch
arm64-add-pmd_mkclean-for-thp.patch
mm-dont-split-thp-page-when-syscall-is-called.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux