On Thu, Sep 06, 2018 at 04:24:59PM +0200, Arnd Bergmann wrote: > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren <ren_guo@xxxxxxxxx> wrote: > > > --- /dev/null > > +++ b/arch/csky/abiv1/memset.c > > @@ -0,0 +1,38 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd. > > +#include <linux/types.h> > > + > > +void *memset(void *dest, int c, size_t l) > > +{ > > + char *d = dest; > > + int ch = c; > > + int tmp; > > + > > + if ((long)d & 0x3) > > + while (l--) *d++ = ch; > > + else { > > + ch &= 0xff; > > + tmp = (ch | ch << 8 | ch << 16 | ch << 24); > > + > > + while (l >= 16) { > > + *(((long *)d)) = tmp; > > + *(((long *)d)+1) = tmp; > > + *(((long *)d)+2) = tmp; > > + *(((long *)d)+3) = tmp; > > + l -= 16; > > + d += 16; > > + } > > + > > + while (l > 3) { > > + *(((long *)d)) = tmp; > > + d = d + 4; > > + l -= 4; > > + } > > + > > + while (l) { > > + *d++ = ch; > > + l--; > > + } > > + } > > + return dest; > > +} > > I see that we have a trivial memset() implementation in lib/string.c, but yours > seems to be better optimized. Where did you get it from? We write it for our ck610 to improve the performance, but I think a lot of other arch done it in asm style. > Is this a version > that works particularly well on C-Sky, or is this a generic optimized memset > that others could use as well? We only test it on C-SKY, but I think it will also work better on other arch CPU than current lib/string.c memset implement. I see that in lib/string.c: void *memset(void *s, int c, size_t count) { char *xs = s; while (count--) *xs++ = c; return s; } The most problem is "char *xs;" and it will cause "st.b" in asm. "st.b" is very slow. Our key improvement is: > > + *(((long *)d)) = tmp; > > + *(((long *)d)+1) = tmp; > > + *(((long *)d)+2) = tmp; > > + *(((long *)d)+3) = tmp; It will cause SOC AXI burst transfer. > In the latter case, we could add it to > lib/string.c and let architectures select it in place of the triivial version. Good idea. Guo Ren