On Tue, Jun 22, 2021 at 2:15 AM Nick Kossifidis <mick@xxxxxxxxxxxx> wrote: > > Hello Matteo and thanks for the patch, > > Στις 2021-06-17 18:27, Matteo Croce έγραψε: > > > > @@ -0,0 +1,91 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * String functions optimized for hardware which doesn't > > + * handle unaligned memory accesses efficiently. > > + * > > + * Copyright (C) 2021 Matteo Croce > > + */ > > + > > +#include <linux/types.h> > > +#include <linux/module.h> > > + > > +/* Minimum size for a word copy to be convenient */ > > +#define MIN_THRESHOLD (BITS_PER_LONG / 8 * 2) > > + > > +/* convenience union to avoid cast between different pointer types */ > > +union types { > > + u8 *u8; > > You are using a type as a name, I'd go with as_bytes/as_ulong/as_uptr > which makes it easier for the reader to understand what you are trying > to do. > Makes sense > > + unsigned long *ulong; > > + uintptr_t uptr; > > +}; > > + > > +union const_types { > > + const u8 *u8; > > + unsigned long *ulong; > > +}; > > + > > I suggest you define those unions inside the function body, no one else > is using them. > They will be used in memset(), in patch 3/3 > > +void *__memcpy(void *dest, const void *src, size_t count) > > +{ > > + const int bytes_long = BITS_PER_LONG / 8; > > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > + const int mask = bytes_long - 1; > > + const int distance = (src - dest) & mask; > > Why not unsigned ints ? > Ok. > > +#endif > > + union const_types s = { .u8 = src }; > > + union types d = { .u8 = dest }; > > + > > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > If you want to be compliant with memcpy you should check for overlapping > regions here since "The memory areas must not overlap", and do nothing > about it because according to POSIX this leads to undefined behavior. > That's why recent libc implementations use memmove in any case (memcpy > is an alias to memmove), which is the suggested approach. > Mmm which memcpy arch implementation does this check? I guess that noone is currently doing it. > > + if (count < MIN_THRESHOLD) > > + goto copy_remainder; > > + > > + /* copy a byte at time until destination is aligned */ > > + for (; count && d.uptr & mask; count--) > > + *d.u8++ = *s.u8++; > > + > > You should check for !IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) here. > I tought that only Little Endian RISC-V machines were supported in Linux. Should I add a BUILD_BUG_ON()? Anyway, if this is going in generic lib/, I'll take care of the endianness. > > + if (distance) { > > + unsigned long last, next; > > + > > + /* move s backward to the previous alignment boundary */ > > + s.u8 -= distance; > > It'd help here to explain that since s is distance bytes ahead relative > to d, and d reached the alignment boundary above, s is now aligned but > the data needs to be shifted to compensate for distance, in order to do > word-by-word copy. > > > + > > + /* 32/64 bit wide copy from s to d. > > + * d is aligned now but s is not, so read s alignment wise, > > + * and do proper shift to get the right value. > > + * Works only on Little Endian machines. > > + */ > > This commend is misleading because s is aligned or else s.ulong[0]/[1] > below would result an unaligned access. > Yes, those two comments should be rephrased, merged and put above. > > + for (next = s.ulong[0]; count >= bytes_long + mask; count -= > > bytes_long) { > > + last = next; > > + next = s.ulong[1]; > > + > > + d.ulong[0] = last >> (distance * 8) | > > + next << ((bytes_long - distance) * 8); > > + > > + d.ulong++; > > + s.ulong++; > > + } > > + > > + /* restore s with the original offset */ > > + s.u8 += distance; > > + } else > > +#endif > > + { > > + /* if the source and dest lower bits are the same, do a simple > > + * 32/64 bit wide copy. > > + */ > > A while() loop would make more sense here. > Ok. > > + for (; count >= bytes_long; count -= bytes_long) > > + *d.ulong++ = *s.ulong++; > > + } > > + > > + /* suppress warning when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y */ > > + goto copy_remainder; > > + > > +copy_remainder: > > + while (count--) > > + *d.u8++ = *s.u8++; > > + > > + return dest; > > +} > > +EXPORT_SYMBOL(__memcpy); > > + > > +void *memcpy(void *dest, const void *src, size_t count) __weak > > __alias(__memcpy); > > +EXPORT_SYMBOL(memcpy); Regards, -- per aspera ad upstream