On 7/2/19 10:45 PM, qi.fuli@xxxxxxxxxxx wrote: > However, we found that with the increase of that the TLB flash was called, > the noise was also increasing. Here we understood that the cause of this > issue is the implementation of Linux's TLB flush for arm64, especially use of > TLBI-is instruction which is a broadcast to all processor core on the system. Are you saying that for a microbenchmark in which very large numbers of threads are created and destroyed rapidly there are a large number of associated tlb range flushes which always use broadcast TLBIs? If that's the case, and the hardware doesn't do any ASID filtering and each TLBI results in a DVM to every PE, would it make sense to look at whether there are ways to improve batching/switch to an IPI approach rather than relying on broadcasts, as a more generic solution? Jon.