On Thu, 6 Feb 2025, Jessica Clarke wrote: > On 5 Feb 2025, at 18:51, Christoph Lameter (Ampere) <cl@xxxxxxxxxx> wrote: > > On Ampere Processor hardware there is no penalty since the logic is build > > into the usual read/write paths. This is by design. There may be on other > > platforms that cannot do this. > > You helpfully cut out all the explanation of where the performance > penalty comes from. But if it’s as you say I can only assume your > design chooses to stall all stores until they have actually written, in > which case you have a performance cost compared with hardware that > omitted MTE or optimises for non-synchronous MTE. The literature on MTE > agrees that it is not no penalty (but can be low penalty). I don’t > really want to have some big debate here about the ins and outs of MTE, > it’s not the place for it, but I will stand up and point out that > claiming MTE to be “no performance penalty” is misrepresentative of the > truth I cannot share details since this information has not been released to be public yet. I hear that a whitepaper will be coming soon to explain this feature. The AmpereOne processors have been released a couple of months ago. I also see that KASAN_HW_TAGS exist but this means that the tags can only be used with CONFIG_KASAN which is a kernel configuration for debug purposes. What we are interested in is a *production* implementation with minimal software overhead that will be the default on ARM64 if the appropriate hardware is detected. That in turn will hopefully allow other software instrumentation that is currently used to keep small objects secure and in turn creates overhead.