+ mremap-remove-latency_limit-from-mremap-to-reduce-the-number-of-tlb-shootdowns.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Wed, 06 Jun 2018 14:40:36 -0700

The patch titled
     Subject: mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns
has been added to the -mm tree.  Its filename is
     mremap-remove-latency_limit-from-mremap-to-reduce-the-number-of-tlb-shootdowns.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mremap-remove-latency_limit-from-mremap-to-reduce-the-number-of-tlb-shootdowns.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mremap-remove-latency_limit-from-mremap-to-reduce-the-number-of-tlb-shootdowns.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Subject: mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns

5d1904204c99 ("mremap: fix race between mremap() and page cleanning")
fixed races between mremap and other operations for both file-backed and
anonymous mappings.  The file-backed was the most critical as it allowed
the possibility that data could be changed on a physical page after
page_mkclean returned which could trigger data loss or data integrity
issues.  A customer reported that the cost of the TLBs for anonymous
regressions was excessive and resulting in a 30-50% drop in performance
overall since this commit on a microbenchmark.  Unfortunately I neither
have access to the test-case nor can I describe what it does other than
saying that mremap operations dominate heavily.

This patch removes the LATENCY_LIMIT to handle TLB flushes on a PMD
boundary instead of every 64 pages to reduce the number of TLB shootdowns
by a factor of 8 in the ideal case.  LATENCY_LIMIT was almost certainly
used originally to limit the PTL hold times but the latency savings are
likely offset by the cost of IPIs in many cases.  This patch is not
reported to completely restore performance but gets it within an
acceptable percentage.  The given metric here is simply described as
"higher is better".

Baseline that was known good
002:  Metric:       91.05
004:  Metric:      109.45
008:  Metric:       73.08
016:  Metric:       58.14
032:  Metric:       61.09
064:  Metric:       57.76
128:  Metric:       55.43

Current
001:  Metric:       54.98
002:  Metric:       56.56
004:  Metric:       41.22
008:  Metric:       35.96
016:  Metric:       36.45
032:  Metric:       35.71
064:  Metric:       35.73
128:  Metric:       34.96

With patch
001:  Metric:       61.43
002:  Metric:       81.64
004:  Metric:       67.92
008:  Metric:       51.67
016:  Metric:       50.47
032:  Metric:       52.29
064:  Metric:       50.01
128:  Metric:       49.04

So for low threads, it's not restored but for larger number of threads,
it's closer to the "known good" baseline.

Using a different mremap-intensive workload that is not representative of
the real workload there is little difference observed outside of noise in
the headline metrics However, the TLB shootdowns are reduced by 11% on
average and at the peak, TLB shootdowns were reduced by 21%.  Interrupts
were sampled every second while the workload ran to get those figures. 
It's known that the figures will vary as the non-representative load is
non-deterministic.

An alternative patch was posted that should have significantly reduced the
TLB flushes but unfortunately it does not perform as well as this version
on the customer test case.  If revisited, the two patches can stack on top
of each other.

Link: http://lkml.kernel.org/r/20180606183803.k7qaw2xnbvzshv34@xxxxxxxxxxxxxxxxxxx
Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Reviewed-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Nadav Amit <nadav.amit@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Aaron Lu <aaron.lu@xxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---


diff -puN mm/mremap.c~mremap-remove-latency_limit-from-mremap-to-reduce-the-number-of-tlb-shootdowns mm/mremap.c

--- a/mm/mremap.c~mremap-remove-latency_limit-from-mremap-to-reduce-the-number-of-tlb-shootdowns
+++ a/mm/mremap.c
@@ -191,8 +191,6 @@ static void move_ptes(struct vm_area_str
 		drop_rmap_locks(vma);
 }
 
-#define LATENCY_LIMIT	(64 * PAGE_SIZE)
-
 unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len,
@@ -247,8 +245,6 @@ unsigned long move_page_tables(struct vm
 		next = (new_addr + PMD_SIZE) & PMD_MASK;
 		if (extent > next - new_addr)
 			extent = next - new_addr;
-		if (extent > LATENCY_LIMIT)
-			extent = LATENCY_LIMIT;
 		move_ptes(vma, old_pmd, old_addr, old_addr + extent, new_vma,
 			  new_pmd, new_addr, need_rmap_locks, &need_flush);
 	}
_

Patches currently in -mm which might be from mgorman@xxxxxxxxxxxxxxxxxxx are

mremap-remove-latency_limit-from-mremap-to-reduce-the-number-of-tlb-shootdowns.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html