Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

First of all thanks for the comments for the patch.

I'm still struggling with this problem to find out the solution.
As a result of an investigation on this problem, after all, I think it 
is necessary to improve TLB flush mechanism of the kernel to fix this 
problem completely.

So, I'd like to restart a discussion. At first, I summarize this problem 
to recall what was the problem and then I want to discuss how to fix it.

Summary of the problem:
A few months ago I proposed patches to solve a performance problem due 
to TLB flush.[1]

A problem is that TLB flush on a core affects all other cores even if 
all other cores do not need actual flush, and it causes performance 
degradation.

In this thread, I explained that:
* I found a performance problem which is caused by TLBI-is instruction.
* The problem occurs like this:
  1) On a core, OS tries to flush TLB using TLBI-is instruction
  2) TLBI-is instruction causes a broadcast to all other cores, and
  each core received hard-wired signal
  3) Each core check if there are TLB entries which have the specified 
ASID/VA
  4) This check causes performance degradation
* We ran FWQ[2] and detected OS jitter due to this problem, this noise
  is serious for HPC usage.

The noise means here a difference between maximum time and minimum time 
which the same work takes.

How to fix:
I think the cause is TLB flush by TLBI-is because the instruction 
affects cores that are not related to its flush.

So the previous patch I posted is
* Use mm_cpumask in mm_struct to find appropriate CPUs for TLB flush
* Exec TLBI instead of TLBI-is only to CPUs specified by mm_cpumask
  (This is the same behavior as arm32 and x86)

And after the discussion about this patch, I got the following comments.
1) This patch switches the behavior (original flush by TLBI-is and new 
flush by TLBI) by boot parameter, this implementation is not acceptable 
due to bad maintainability.
2) Even if this patch fixes this problem, it may cause another 
performance problem.

I'd like to start over the implementation by considering these points.
For the second comment above, I will run a benchmark test to analyze the 
impact on performance.
Please let me know if there are other points I should take into 
consideration.

[1] https://lkml.org/lkml/2019/6/17/703
[2] https://asc.llnl.gov/sequoia/benchmarks/FTQ_summary_v1.1.pdf

Thanks,
QI Fuli


On 6/17/19 11:32 PM, Takao Indoh wrote:
> From: Takao Indoh <indou.takao@xxxxxxxxxxx>
> 
> I found a performance issue related on the implementation of Linux's TLB
> flush for arm64.
> 
> When I run a single-threaded test program on moderate environment, it
> usually takes 39ms to finish its work. However, when I put a small
> apprication, which just calls mprotest() continuously, on one of sibling
> cores and run it simultaneously, the test program slows down significantly.
> It becomes 49ms(125%) on ThunderX2. I also detected the same problem on
> ThunderX1 and Fujitsu A64FX.
> 
> I suppose the root cause of this issue is the implementation of Linux's TLB
> flush for arm64, especially use of TLBI-is instruction which is a broadcast
> to all processor core on the system. In case of the above situation,
> TLBI-is is called by mprotect().
> 
> This is not a problem for small environment, but this causes a significant
> performance noise for large-scale HPC environment, which has more than
> thousand nodes with low latency interconnect.
> 
> To fix this problem, this patch adds new boot parameter
> 'disable_tlbflush_is'.  In the case of flush_tlb_mm() *without* this
> parameter, TLB entry is invalidated by __tlbi(aside1is, asid). By this
> instruction, all CPUs within the same inner shareable domain check if there
> are TLB entries which have this ASID, this causes performance noise. OTOH,
> when this new parameter is specified, TLB entry is invalidated by
> __tlbi(aside1, asid) only on the CPUs specified by mm_cpumask(mm).
> Therefore TLB flush is done on minimal CPUs and performance problem does
> not occur. Actually I confirm the performance problem is fixed by this
> patch.
> 
> Takao Indoh (2):
>    arm64: mm: Restore mm_cpumask (revert commit 38d96287504a ("arm64: mm:
>      kill mm_cpumask usage"))
>    arm64: tlb: Add boot parameter to disable TLB flush within the same
>      inner shareable domain
> 
>   .../admin-guide/kernel-parameters.txt         |   4 +
>   arch/arm64/include/asm/mmu_context.h          |   7 +-
>   arch/arm64/include/asm/tlbflush.h             |  61 ++-----
>   arch/arm64/kernel/Makefile                    |   2 +-
>   arch/arm64/kernel/smp.c                       |   6 +
>   arch/arm64/kernel/tlbflush.c                  | 155 ++++++++++++++++++
>   arch/arm64/mm/context.c                       |   2 +
>   7 files changed, 186 insertions(+), 51 deletions(-)
>   create mode 100644 arch/arm64/kernel/tlbflush.c
> 




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux