On 7/8/19 8:25 PM, Jon Masters wrote: > On 7/2/19 10:45 PM, qi.fuli@xxxxxxxxxxx wrote: > >> However, we found that with the increase of that the TLB flash was called, >> the noise was also increasing. Here we understood that the cause of this >> issue is the implementation of Linux's TLB flush for arm64, especially use of >> TLBI-is instruction which is a broadcast to all processor core on the system. > > Are you saying that for a microbenchmark in which very large numbers of > threads are created and destroyed rapidly there are a large number of > associated tlb range flushes which always use broadcast TLBIs? > > If that's the case, and the hardware doesn't do any ASID filtering and > each TLBI results in a DVM to every PE, would it make sense to look at > whether there are ways to improve batching/switch to an IPI approach > rather than relying on broadcasts, as a more generic solution? What I meant was a heuristic to do this automatically, rather than via a command line. Jon.