Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > Commit 5d1904204c99 ("mremap: fix race between mremap() and page cleanning") > fixed races between mremap and other operations for both file-backed and > anonymous mappings. The file-backed was the most critical as it allowed the > possibility that data could be changed on a physical page after page_mkclean > returned which could trigger data loss or data integrity issues. A customer > reported that the cost of the TLBs for anonymous regressions was excessive > and resulting in a 30-50% drop in performance overall since this commit > on a microbenchmark. Unfortunately I neither have access to the test-case > nor can I describe what it does other than saying that mremap operations > dominate heavily. > > This patch increases the LATENCY_LIMIT to handle TLB flushes on a > PMD boundary instead of every 64 pages. This reduces the number of TLB > shootdowns by a factor of 8 which is not reported to completely restore > performance but gets it within an acceptable percentage. The given metric > here is simply described as "higher is better". > > Baseline that was known good > 002: Metric: 91.05 > 004: Metric: 109.45 > 008: Metric: 73.08 > 016: Metric: 58.14 > 032: Metric: 61.09 > 064: Metric: 57.76 > 128: Metric: 55.43 > > Current > 001: Metric: 54.98 > 002: Metric: 56.56 > 004: Metric: 41.22 > 008: Metric: 35.96 > 016: Metric: 36.45 > 032: Metric: 35.71 > 064: Metric: 35.73 > 128: Metric: 34.96 > > With patch > 001: Metric: 61.43 > 002: Metric: 81.64 > 004: Metric: 67.92 > 008: Metric: 51.67 > 016: Metric: 50.47 > 032: Metric: 52.29 > 064: Metric: 50.01 > 128: Metric: 49.04 > > So for low threads, it's not restored but for larger number of threads, > it's closer to the "known good" baseline. The downside is that PTL lock > hold times will be slightly higher but it's unlikely that an mremap and > another operation will contend on the same PMD. This is the first time I > encountered a realistic workload that was mremap intensive (thousands of > calls per second with small ranges dominating). > > Using a different mremap-intensive workload that is not representative of > the real workload there is little difference observed outside of noise in > the headline metrics However, the TLB shootdowns are reduced by 11% on > average and at the peak, TLB shootdowns were reduced by 21%. Interrupts > were sampled every second while the workload ran to get those figures. > It's known that the figures will vary as the non-representative load is > non-deterministic. > > An alternative patch was posted that should have significantly reduced the > TLB flushes but unfortunately it does not perform as well as this version > on the customer test case. If revisited, the two patches can stack on top > of each other. > > Signed-off-by: Mel Gorman <mgorman@xxxxxxxx> > --- > mm/mremap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/mremap.c b/mm/mremap.c > index 049470aa1e3e..b5017cb2e1e9 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -191,7 +191,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > drop_rmap_locks(vma); > } > > -#define LATENCY_LIMIT (64 * PAGE_SIZE) > +#define LATENCY_LIMIT (PMD_SIZE) > > unsigned long move_page_tables(struct vm_area_struct *vma, > unsigned long old_addr, struct vm_area_struct *new_vma, This LATENCY_LIMIT is only used in move_page_tables() in the following manner: next = (new_addr + PMD_SIZE) & PMD_MASK; if (extent > next - new_addr) extent = next - new_addr; if (extent > LATENCY_LIMIT) extent = LATENCY_LIMIT; If LATENCY_LIMIT is to be changed to PMD_SIZE, then IIUC the last condition is not required, and LATENCY_LIMIT can just be removed (assuming there is no underflow case that hides somewhere). No?