On Mon, 2018-01-29 at 18:04 -0600, Segher Boessenkool wrote: > > Things like a good memcpy I wouldn't even try to write in inline > assembler, not even for a very simple CPU; memcpy is so important > that writing ~100 lines of assembler code is well worth it. Inline > assembler is more suitable for access to, say, instructions writing > special registers that GCC does not know about. Which sort of brings us back to my original reply ... The movua.l instruction should be fixed/improved in the compiler. Writing a über-optimized memcpy is surely not a bad thing to do, but keep in mind that GCC will also try to inline memcpy calls, and it will not inline the user defined memcpy code but instead emit its own code. There is even already some support in GCC to use movua.l for memcpy, which says: /* If we could use mov.l to move words and dest is word-aligned, we can use movua.l for loads and still generate a relatively short and efficient sequence. */ I think it's more useful to improve the code generated for inlined memcpy calls which is normally used for short copies. Programs that copy huge amounts of data (without doing anything with it) are strange, I think. It's more efficient to avoid copying data. Just for your reference, below are some related known issues on SH: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77610 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52480 The code in GCC that deals with memcpy on SH is here https://github.com/gcc-mirror/gcc/blob/master/gcc/config/sh/sh-mem.cc it might be useful for you to know the conditions under which GCC will do what. Cheers, Oleg