Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush instruction within the same inner shareable domain

Jon Masters <jcm@xxxxxxxxxxxxxx> · Sun, 1 Dec 2019 11:02:42 -0500

On 11/1/19 1:28 PM, Will Deacon wrote:

On Fri, Nov 01, 2019 at 09:56:05AM +0000, qi.fuli@xxxxxxxxxxx wrote:

In this thread, I explained that:
* I found a performance problem which is caused by TLBI-is instruction.
* The problem occurs like this:
   1) On a core, OS tries to flush TLB using TLBI-is instruction
   2) TLBI-is instruction causes a broadcast to all other cores, and
   each core received hard-wired signal
   3) Each core check if there are TLB entries which have the specified
ASID/VA

(the above confuses implementation with architecture)

<snip>

I think it's worth bearing in mind that I have little sympathy for the
problem that you are seeing. As far as I can tell, you've done the
following:

   1. You designed a CPU micro-architecture that stalls whenever it receives
      a TLB invalidation request.

s/SPARC/Arm/ && wire in DVM

   2. You integrated said CPU design into a system where broadcast TLB
      invalidation is not filtered and therefore stalls every CPU every
      time that /any/ TLB invalidation is broadcast.

   3. You deployed a mixture of Linux and jitter-sensitive software on
      this system, and now you're failing to meet your performance
      requirements.

Have I got that right?

If so, given that your CPU design isn't widely available, nobody else
appears to have made this mistake and jitter hasn't been reported as an
issue for any other systems, it's very unlikely that we're going to make
invasive upstream kernel changes to support you. I'm sorry, but all I can
suggest is that you check that your micro-architecture and performance
requirements are aligned with the design of Linux *before* building another
machine like this in future.

I hate to be blunt, but I also don't want to waste your time.

I always tried to ask nicely for the above to be heeded. There's a 
difference between "hi, our implementation doesn't scale, and here's 
why" vs "there's a problem with all TLBIs...". There isn't. The problem 
is the implementation and that should have been called out first thing.

Jon.

--
Computer Architect