+ mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm/cow: optimise pte dirty/accessed bits handling in fork
has been added to the -mm tree.  Its filename is
     mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Nicholas Piggin <npiggin@xxxxxxxxx>
Subject: mm/cow: optimise pte dirty/accessed bits handling in fork

fork clears dirty/accessed bits from new ptes in the child.  This logic
has existed since mapped page reclaim was done by scanning ptes when it
may have been quite important.  Today with physical based pte scanning,
there is less reason to clear these bits.  Dirty bits are all tested and
cleared together and any dirty bit is the same as many dirty bits.  Any
young bit is treated similarly to many young bits, but not quite the same.
A comment has been added where there is some difference.

This eliminates a major source of faults powerpc/radix requires to set
dirty/accessed bits in ptes, speeding up a fork/exit microbenchmark by
about 5% on POWER9 (16600 -> 17500 fork/execs per second).

Skylake appears to have a micro-fault overhead too -- a test which
allocates 4GB anonymous memory, reads each page, then forks, and times the
child reading a byte from each page.  The first pass over the pages takes
about 1000 cycles per page, the second pass takes about 27 cycles (TLB
miss).  With no additional minor faults measured due to either child pass,
and the page array well exceeding TLB capacity, the large cost must be
caused by micro faults caused by setting accessed bit.

Link: http://lkml.kernel.org/r/20180828112034.30875-3-npiggin@xxxxxxxxx
Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/huge_memory.c |    2 --
 mm/memory.c      |   10 +++++-----
 mm/vmscan.c      |    8 ++++++++
 3 files changed, 13 insertions(+), 7 deletions(-)

--- a/mm/huge_memory.c~mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork
+++ a/mm/huge_memory.c
@@ -977,7 +977,6 @@ int copy_huge_pmd(struct mm_struct *dst_
 		pmdp_set_wrprotect(src_mm, addr, src_pmd);
 		pmd = pmd_wrprotect(pmd);
 	}
-	pmd = pmd_mkold(pmd);
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
 	ret = 0;
@@ -1071,7 +1070,6 @@ int copy_huge_pud(struct mm_struct *dst_
 		pudp_set_wrprotect(src_mm, addr, src_pud);
 		pud = pud_wrprotect(pud);
 	}
-	pud = pud_mkold(pud);
 	set_pud_at(dst_mm, addr, dst_pud, pud);
 
 	ret = 0;
--- a/mm/memory.c~mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork
+++ a/mm/memory.c
@@ -1028,12 +1028,12 @@ copy_one_pte(struct mm_struct *dst_mm, s
 	}
 
 	/*
-	 * If it's a shared mapping, mark it clean in
-	 * the child
+	 * Child inherits dirty and young bits from parent. There is no
+	 * point clearing them because any cleaning or aging has to walk
+	 * all ptes anyway, and it will notice the bits set in the parent.
+	 * Leaving them set avoids stalls and even page faults on CPUs that
+	 * handle these bits in software.
 	 */
-	if (vm_flags & VM_SHARED)
-		pte = pte_mkclean(pte);
-	pte = pte_mkold(pte);
 
 	page = vm_normal_page(vma, addr, pte);
 	if (page) {
--- a/mm/vmscan.c~mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork
+++ a/mm/vmscan.c
@@ -1021,6 +1021,14 @@ static enum page_references page_check_r
 		 * to look twice if a mapped file page is used more
 		 * than once.
 		 *
+		 * fork() will set referenced bits in child ptes despite
+		 * not having been accessed, to avoid micro-faults of
+		 * setting accessed bits. This heuristic is not perfectly
+		 * accurate in other ways -- multiple map/unmap in the
+		 * same time window would be treated as multiple references
+		 * despite same number of actual memory accesses made by
+		 * the program.
+		 *
 		 * Mark it and spare it for another trip around the
 		 * inactive list.  Another page table reference will
 		 * lead to its activation.
_

Patches currently in -mm which might be from npiggin@xxxxxxxxx are

mm-cow-dont-bother-write-protectig-already-write-protected-huge-pages.patch
mm-cow-optimise-pte-dirty-accessed-bits-handling-in-fork.patch
mm-optimise-pte-dirty-accessed-bit-setting-by-demand-based-pte-insertion.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux