+ mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm, THP, swap: enable PMD swap operations for CONFIG_THP_SWAP
has been added to the -mm tree.  Its filename is
     mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Huang Ying <ying.huang@xxxxxxxxx>
Subject: mm, THP, swap: enable PMD swap operations for CONFIG_THP_SWAP

Patch series "mm, THP, swap: Swapout/swapin THP in one piece", v4.

This is the final step of THP (Transparent Huge Page) swap optimization. 
After the first and second step, the splitting huge page is delayed from
almost the first step of swapout to after swapout has been finished.  In
this step, we avoid splitting THP for swapout and swapout/swapin the THP
in one piece.

We tested the patchset with vm-scalability benchmark swap-w-seq test case,
with 16 processes.  The test case forks 16 processes.  Each process
allocates large anonymous memory range, and writes it from begin to end
for 8 rounds.  The first round will swapout, while the remaining rounds
will swapin and swapout.  The test is done on a Xeon E5 v3 system, the
swap device used is a RAM simulated PMEM (persistent memory) device.  The
test result is as follow,

            base                  optimized
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1417897 ±  2%    +992.8%   15494673        vm-scalability.throughput
   1020489 ±  4%   +1091.2%   12156349        vmstat.swap.si
   1255093 ±  3%    +940.3%   13056114        vmstat.swap.so
   1259769 ±  7%   +1818.3%   24166779        meminfo.AnonHugePages
  28021761           -10.7%   25018848 ±  2%  meminfo.AnonPages
  64080064 ±  4%     -95.6%    2787565 ± 33%  interrupts.CAL:Function_call_interrupts
     13.91 ±  5%     -13.8        0.10 ± 27%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath

Where, the score of benchmark (bytes written per second) improved 992.8%. 
The swapout/swapin throughput improved 1008% (from about 2.17GB/s to
24.04GB/s).  The performance difference is huge.  In base kernel, for the
first round of writing, the THP is swapout and split, so in the remaining
rounds, there is only normal page swapin and swapout.  While in optimized
kernel, the THP is kept after first swapout, so THP swapin and swapout is
used in the remaining rounds.  This shows the key benefit to
swapout/swapin THP in one piece, the THP will be kept instead of being
split.  meminfo information verified this, in base kernel only 4.5% of
anonymous page are THP during the test, while in optimized kernel, that is
96.6%.  The TLB flushing IPI (represented as
interrupts.CAL:Function_call_interrupts) reduced 95.6%, while cycles for
spinlock reduced from 13.9% to 0.1%.  These are performance benefit of THP
swapout/swapin too.

Below is the description for all steps of THP swap optimization.

Recently, the performance of the storage devices improved so fast that we
cannot saturate the disk bandwidth with single logical CPU when do page
swapping even on a high-end server machine.  Because the performance of
the storage device improved faster than that of single logical CPU.  And
it seems that the trend will not change in the near future.  On the other
hand, the THP becomes more and more popular because of increased memory
size.  So it becomes necessary to optimize THP swap performance.

The advantages to swapout/swapin a THP in one piece include:

- Batch various swap operations for the THP.  Many operations need to be
  done once per THP instead of per normal page, for example,
  allocating/freeing the swap space, writing/reading the swap space,
  flushing TLB, page fault, etc.  This will improve the performance of the
  THP swap greatly.

- The THP swap space read/write will be large sequential IO (2M on
  x86_64).  It is particularly helpful for the swapin, which are usually
  4k random IO.  This will improve the performance of the THP swap too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The THP order pages will be free up
  after THP swapout.

- It will improve the THP utilization on the system with the swap turned
  on.  Because the speed for khugepaged to collapse the normal pages into
  the THP is quite slow.  After the THP is split during the swapout, it
  will take quite long time for the normal pages to collapse back into the
  THP after being swapin.  The high THP utilization helps the efficiency
  of the page based memory management too.

There are some concerns regarding THP swapin, mainly because possible
enlarged read/write IO size (for swapout/swapin) may put more overhead on
the storage device.  To deal with that, the THP swapin is turned on only
when necessary.  A new sysfs interface:
/sys/kernel/mm/transparent_hugepage/swapin_enabled is added to configure
it.  It uses "always/never/madvise" logic, to be turned on globally,
turned off globally, or turned on only for VMA with MADV_HUGEPAGE, etc. 
GE, etc.


This patch (of 21):

Previously, the PMD swap operations are only enabled for
CONFIG_ARCH_ENABLE_THP_MIGRATION.  Because they are only used by the THP
migration support.  We will support PMD swap mapping to the huge swap
cluster and swapin the THP as a whole.  That will be enabled via
CONFIG_THP_SWAP and needs these PMD swap operations.  So enable the PMD
swap operations for CONFIG_THP_SWAP too.

Link: http://lkml.kernel.org/r/20180622035151.6676-2-ying.huang@xxxxxxxxx
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Shaohua Li <shli@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Zi Yan <zi.yan@xxxxxxxxxxxxxx>
Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---


diff -puN arch/x86/include/asm/pgtable.h~mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap arch/x86/include/asm/pgtable.h
--- a/arch/x86/include/asm/pgtable.h~mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap
+++ a/arch/x86/include/asm/pgtable.h
@@ -1224,7 +1224,7 @@ static inline pte_t pte_swp_clear_soft_d
 	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
 }
 
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP)
 static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
 {
 	return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
diff -puN include/asm-generic/pgtable.h~mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h~mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap
+++ a/include/asm-generic/pgtable.h
@@ -675,7 +675,7 @@ static inline void ptep_modify_prot_comm
 #endif
 
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
-#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
+#if !defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) && !defined(CONFIG_THP_SWAP)
 static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
 {
 	return pmd;
diff -puN include/linux/swapops.h~mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap include/linux/swapops.h
--- a/include/linux/swapops.h~mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap
+++ a/include/linux/swapops.h
@@ -258,17 +258,7 @@ static inline int is_write_migration_ent
 
 #endif
 
-struct page_vma_mapped_walk;
-
-#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
-extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
-		struct page *page);
-
-extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw,
-		struct page *new);
-
-extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
-
+#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_THP_SWAP)
 static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
 {
 	swp_entry_t arch_entry;
@@ -286,6 +276,28 @@ static inline pmd_t swp_entry_to_pmd(swp
 	arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
 	return __swp_entry_to_pmd(arch_entry);
 }
+#else
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+	return swp_entry(0, 0);
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+	return __pmd(0);
+}
+#endif
+
+struct page_vma_mapped_walk;
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
+		struct page *page);
+
+extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw,
+		struct page *new);
+
+extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
 
 static inline int is_pmd_migration_entry(pmd_t pmd)
 {
@@ -306,16 +318,6 @@ static inline void remove_migration_pmd(
 
 static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { }
 
-static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
-{
-	return swp_entry(0, 0);
-}
-
-static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
-{
-	return __pmd(0);
-}
-
 static inline int is_pmd_migration_entry(pmd_t pmd)
 {
 	return 0;
_

Patches currently in -mm which might be from ying.huang@xxxxxxxxx are

mm-clear_huge_page-move-order-algorithm-into-a-separate-function.patch
mm-huge-page-copy-target-sub-page-last-when-copy-huge-page.patch
mm-hugetlbfs-rename-address-to-haddr-in-hugetlb_cow.patch
mm-hugetlbfs-pass-fault-address-to-cow-handler.patch
mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch
mm-swap-fix-race-between-swapoff-and-some-swap-operations-v6.patch
mm-fix-race-between-swapoff-and-mincore.patch
mm-thp-swap-enable-pmd-swap-operations-for-config_thp_swap.patch
mm-thp-swap-make-config_thp_swap-depends-on-config_swap.patch
mm-thp-swap-support-pmd-swap-mapping-in-swap_duplicate.patch
mm-thp-swap-support-pmd-swap-mapping-in-swapcache_free_cluster.patch
mm-thp-swap-support-pmd-swap-mapping-in-free_swap_and_cache-swap_free.patch
mm-thp-swap-support-pmd-swap-mapping-when-splitting-huge-pmd.patch
mm-thp-swap-support-pmd-swap-mapping-in-split_swap_cluster.patch
mm-thp-swap-support-to-read-a-huge-swap-cluster-for-swapin-a-thp.patch
mm-thp-swap-swapin-a-thp-as-a-whole.patch
mm-thp-swap-support-to-count-thp-swapin-and-its-fallback.patch
mm-thp-swap-add-sysfs-interface-to-configure-thp-swapin.patch
mm-thp-swap-support-pmd-swap-mapping-in-swapoff.patch
mm-thp-swap-support-pmd-swap-mapping-in-madvise_free.patch
mm-cgroup-thp-swap-support-to-move-swap-account-for-pmd-swap-mapping.patch
mm-thp-swap-support-to-copy-pmd-swap-mapping-when-fork.patch
mm-thp-swap-free-pmd-swap-mapping-when-zap_huge_pmd.patch
mm-thp-swap-support-pmd-swap-mapping-for-madv_willneed.patch
mm-thp-swap-support-pmd-swap-mapping-in-mincore.patch
mm-thp-swap-support-pmd-swap-mapping-in-common-path.patch
mm-thp-swap-create-pmd-swap-mapping-when-unmap-the-thp.patch
mm-thp-avoid-to-split-thp-when-reclaim-madv_free-thp.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux