On Tue, May 10, 2022 at 2:31 AM Borislav Petkov <bp@xxxxxxxxx> wrote: > > clear_user_original: > Amean: 9219.71 (Sum: 6340154910, samples: 687674) > > fsrm: > Amean: 8030.63 (Sum: 5522277720, samples: 687652) Well, that's pretty conclusive. I'm obviously very happy with fsrm. I've been pushing for that thing for probably over two decades by now, because I absolutely detest uarch optimizations for memset/memcpy that can never be done well in software anyway (because it depends not just on cache organization, but on cache sizes and dynamic cache hit/miss behavior of the load). And one of the things I always wanted to do was to just have memcpy/memset entirely inlined. In fact, if you go back to the 0.01 linux kernel sources, you'll see that they only compile with my bastardized version of gcc-1.40, because I made the compiler inline those things with 'rep movs/stos', and there was no other implementation of memcpy/memset at all. That was a bit optimistic at the time, but here we are, 30+ years later and it is finally looking possible, at least on some uarchs. Linus