On Wed, Jun 16, 2021 at 1:46 PM Guo Ren <guoren@xxxxxxxxxx> wrote: > > Hi Matteo, > > Have you tried Glibc generic implementation code? > ref: https://lore.kernel.org/linux-arch/20190629053641.3iBfk9-I_D29cDp9yJnIdIg7oMtHNZlDmhLQPTumhEc@z/#t > > If Glibc codes have the same performance in your hardware, then you > could give a generic implementation first. > Hi, I had a look, it seems that it's a C unrolled version with the 'register' keyword. The same one was already merged in nios2: https://elixir.bootlin.com/linux/latest/source/arch/nios2/lib/memcpy.c#L68 I copied _wordcopy_fwd_aligned() from Glibc, and I have a very similar result of the other versions: [ 563.359126] Strings selftest: memcpy(src+7, dst+7): 257 Mb/s Regards, -- per aspera ad upstream