On Tue, May 10, 2022 at 10:28:28AM -0700, Linus Torvalds wrote: > Well, that's pretty conclusive. Yap. It appears I don't have a production-type Icelake so I probably can't show the numbers there but at least I can check whether there's an improvement too. > I'm obviously very happy with fsrm. I've been pushing for that thing > for probably over two decades by now, The time sounds about right - I'm closing in on two decades poking at the kernel myself and I've yet to see a more complex feature I've been advocating for, materialize. > because I absolutely detest uarch optimizations for memset/memcpy that > can never be done well in software anyway (because it depends not just > on cache organization, but on cache sizes and dynamic cache hit/miss > behavior of the load). Yeah, you want all that cacheline aggregation to happen underneath where it can do all the checks etc. > And one of the things I always wanted to do was to just have > memcpy/memset entirely inlined. > > In fact, if you go back to the 0.01 linux kernel sources, you'll see LOL, I think I've seen those sources printed out on a wall somewhere. :-) > that they only compile with my bastardized version of gcc-1.40, > because I made the compiler inline those things with 'rep movs/stos', > and there was no other implementation of memcpy/memset at all. Yeah, I have it on my todo to look at inlining the other primitives too, and see whether that brings any improvements. Now our patching infrastructure is nicely mature too so that we can be very creative there. > That was a bit optimistic at the time, but here we are, 30+ years > later and it is finally looking possible, at least on some uarchs. Yap, it takes "only" 30+ years. :-\ And when you think of all the crap stuff that got added in silicon and *removed* *again* in the meantime... but I'm optimistic now that Murphy's Law is not going to hold true anymore, we will finally start optimizing hardware *and* software. :-))) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette