On Sat, Sep 11, 2021 at 11:49 AM Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote: > > On Thu, 05 Aug 2021 03:31:04 PDT (-0700), mcroce@xxxxxxxxxxxxxxxxxxx wrote: > > On Wed, Aug 4, 2021 at 10:40 PM Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote: > >> > >> On Tue, 03 Aug 2021 09:54:34 PDT (-0700), mcroce@xxxxxxxxxxxxxxxxxxx wrote: > >> > On Mon, Jul 19, 2021 at 1:44 PM Matteo Croce <mcroce@xxxxxxxxxxxxxxxxxxx> wrote: > >> >> > >> >> From: Matteo Croce <mcroce@xxxxxxxxxxxxx> > >> >> > >> >> Use the generic routines which handle alignment properly. > >> >> > >> >> These are the performances measured on a BeagleV machine for a > >> >> 32 mbyte buffer: > >> >> > >> >> memcpy: > >> >> original aligned: 75 Mb/s > >> >> original unaligned: 75 Mb/s > >> >> new aligned: 114 Mb/s > >> >> new unaligned: 107 Mb/s > >> >> > >> >> memset: > >> >> original aligned: 140 Mb/s > >> >> original unaligned: 140 Mb/s > >> >> new aligned: 241 Mb/s > >> >> new unaligned: 241 Mb/s > >> >> > >> >> TCP throughput with iperf3 gives a similar improvement as well. > >> >> > >> >> This is the binary size increase according to bloat-o-meter: > >> >> > >> >> add/remove: 0/0 grow/shrink: 4/2 up/down: 432/-36 (396) > >> >> Function old new delta > >> >> memcpy 36 324 +288 > >> >> memset 32 148 +116 > >> >> strlcpy 116 132 +16 > >> >> strscpy_pad 84 96 +12 > >> >> strlcat 176 164 -12 > >> >> memmove 76 52 -24 > >> >> Total: Before=1225371, After=1225767, chg +0.03% > >> >> > >> >> Signed-off-by: Matteo Croce <mcroce@xxxxxxxxxxxxx> > >> >> Signed-off-by: Emil Renner Berthing <kernel@xxxxxxxx> > >> >> --- > >> > > >> > Hi, > >> > > >> > can someone have a look at this change and share opinions? > >> > >> This LGTM. How are the generic string routines landing? I'm happy to > >> take this into my for-next, but IIUC we need the optimized generic > >> versions first so we don't have a performance regression falling back to > >> the trivial ones for a bit. Is there a shared tag I can pull in? > > > > Hi, > > > > I see them only in linux-next by now. > > These ended up getting rejected by Linus, so I'm going to hold off on > this for now. If they're really out of lib/ then I'll take the C > routines in arch/riscv, but either way it's an issue for the next > release. Agree, we should take the C routine in arch/riscv for common implementation. If any vendor what custom implementation they could use the alternative framework in errata for string operations. -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/