On Mon, Jun 21, 2021 at 4:26 PM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Thu, Jun 17, 2021 at 05:27:52PM +0200, Matteo Croce wrote: > > +extern void *memcpy(void *dest, const void *src, size_t count); > > +extern void *__memcpy(void *dest, const void *src, size_t count); > > No need for externs. > Right. > > +++ b/arch/riscv/lib/string.c > > Nothing in her looks RISC-V specific. Why doesn't this go into lib/ so > that other architectures can use it as well. > Technically it could go into lib/ and be generic. If you think it's worth it, I have just to handle the different left/right shift because of endianness. > > +#include <linux/module.h> > > I think you only need export.h. > Nice. > > +void *__memcpy(void *dest, const void *src, size_t count) > > +{ > > + const int bytes_long = BITS_PER_LONG / 8; > > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > + const int mask = bytes_long - 1; > > + const int distance = (src - dest) & mask; > > +#endif > > + union const_types s = { .u8 = src }; > > + union types d = { .u8 = dest }; > > + > > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > + if (count < MIN_THRESHOLD) > > Using IS_ENABLED we can avoid a lot of the mess in this > function. > > int distance = 0; > > if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { > if (count < MIN_THRESHOLD) > goto copy_remainder; > > /* copy a byte at time until destination is aligned */ > for (; count && d.uptr & mask; count--) > *d.u8++ = *s.u8++; > distance = (src - dest) & mask; > } > Cool. What about putting this check in the very start: if (count < MIN_THRESHOLD) goto copy_remainder; And since count is at least twice bytes_long, remove count from the check below? Also, setting distance after d is aligned is as simple as getting the lower bits of s: if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) { /* Copy a byte at time until destination is aligned. */ for (; d.uptr & mask; count--) *d.u8++ = *s.u8++; distance = s.uptr & mask; } > if (distance) { > ... > > > + /* 32/64 bit wide copy from s to d. > > + * d is aligned now but s is not, so read s alignment wise, > > + * and do proper shift to get the right value. > > + * Works only on Little Endian machines. > > + */ > > Normal kernel comment style always start with a: > Right, I was used to netdev ones :) > /* > > > > + for (next = s.ulong[0]; count >= bytes_long + mask; count -= bytes_long) { > > Please avoid the pointlessly overlong line. And (just as a matter of > personal preference) I find for loop that don't actually use a single > iterator rather confusing. Wjy not simply: > > next = s.ulong[0]; > while (count >= bytes_long + mask) { > ... > count -= bytes_long; > } My fault, in a previous version it was: next = s.ulong[0]; for (; count >= bytes_long + mask; count -= bytes_long) { So to have a single `count` counter for the loop. Regards, -- per aspera ad upstream