On Thu, Dec 06, 2018 at 08:42:03PM +0000, Alexander Van Brunt wrote: > > > > If we roll a TLB invalidation routine without the trailing DSB, what sort of > > > > performance does that get you? > > > > > > It is not as good. In some cases, it is really bad. Skipping the invalidate was > > > the most consistent and fast implementation. > > > My problem with that is it's not really much different to just skipping the > > page table update entirely. Skipping the DSB is closer to what is done on > > x86, where we bound the stale entry time to the next context-switch. > > Which of the three implementations is the "that" and "it" in the first sentence? that = it = skipping the whole invalidation + the DSB > > Given that I already queued the version without the DSB, we have the choice > > to either continue with that or to revert it and go back to the previous > > behaviour. Which would you prefer? > > To me, skipping the DSB is a win over doing the invalidate and the DSB because > it is faster on average. > > DSBs have a big impact on the performance of other CPUs in the inner shareable > domain because of the ordering requirements. For example, we have observed > Cortex A57s stalling all CPUs in the cluster until Device accesses complete. > > Would you be open to a patch on top of the DSB skipping patch that skips the > whole invalidate? I don't think so; we don't have an upper bound on how long we'll have a stale TLB if remove the invalidation completely. Will