On 5 Feb 2025, at 18:51, Christoph Lameter (Ampere) <cl@xxxxxxxxxx> wrote: > > On Tue, 4 Feb 2025, Jessica Clarke wrote: > >> It’s not “no performance penalty”, there is a cost to tracking the MTE >> tags for checking. In asynchronous (or asymmetric) mode that’s not too > > > On Ampere Processor hardware there is no penalty since the logic is build > into the usual read/write paths. This is by design. There may be on other > platforms that cannot do this. You helpfully cut out all the explanation of where the performance penalty comes from. But if it’s as you say I can only assume your design chooses to stall all stores until they have actually written, in which case you have a performance cost compared with hardware that omitted MTE or optimises for non-synchronous MTE. The literature on MTE agrees that it is not no penalty (but can be low penalty). I don’t really want to have some big debate here about the ins and outs of MTE, it’s not the place for it, but I will stand up and point out that claiming MTE to be “no performance penalty” is misrepresentative of the truth Jess