I realized that ARM uses the generic memmove() implementation which is rather slow. This series adds the assembler optimized version for ARM. The corresponding recent Linux code doesn't fit into barebox anymore, so to merge the code the surroundings have to be updated first, hence the series is bigger than I like it to be. Sascha Signed-off-by: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> --- Changes in v2: - Add note which Linux version the updated files are from - drop unused copy_template.S for ARM64 - Drop unnecessary AFLAGS_NOWARN - restore the SPDX-FileCopyrightText lines in memcpy.S - Link to v1: https://lore.barebox.org/20240925-arm-assembly-memmove-v1-0-0d92103658a0@xxxxxxxxxxxxxx --- Sascha Hauer (10): ARM: Use optimized reads[bwl] and writes[bwl] functions ARM: rename logical shift macros push pull into lspush lspull ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ ARM: update lib1funcs.S from Linux ARM: update findbit.S from Linux ARM: update io-* from Linux ARM: always assume the unified syntax for assembly code ARM: update memcpy.S and memset.S from Linux lib/string.c: export non optimized memmove as __default_memmove ARM: add optimized memmove arch/arm/Kconfig | 4 - arch/arm/cpu/cache-armv4.S | 11 +- arch/arm/cpu/cache-armv5.S | 13 +- arch/arm/cpu/cache-armv6.S | 13 +- arch/arm/cpu/cache-armv7.S | 9 +- arch/arm/cpu/hyp.S | 3 +- arch/arm/cpu/setupc_32.S | 7 +- arch/arm/cpu/sm_as.S | 3 +- arch/arm/include/asm/assembler.h | 36 ++++- arch/arm/include/asm/cache.h | 8 ++ arch/arm/include/asm/io.h | 24 ++++ arch/arm/include/asm/string.h | 4 +- arch/arm/include/asm/unified.h | 75 +---------- arch/arm/lib32/Makefile | 1 + arch/arm/lib32/ashldi3.S | 3 +- arch/arm/lib32/ashrdi3.S | 3 +- arch/arm/lib32/copy_template.S | 86 ++++++------ arch/arm/lib32/findbit.S | 243 +++++++++++++-------------------- arch/arm/lib32/io-readsb.S | 32 ++--- arch/arm/lib32/io-readsl.S | 32 ++--- arch/arm/lib32/io-readsw-armv4.S | 26 ++-- arch/arm/lib32/io-writesb.S | 34 ++--- arch/arm/lib32/io-writesl.S | 36 ++--- arch/arm/lib32/io-writesw-armv4.S | 16 +-- arch/arm/lib32/lib1funcs.S | 80 ++++++----- arch/arm/lib32/lshrdi3.S | 3 +- arch/arm/lib32/memcpy.S | 30 +++-- arch/arm/lib32/memmove.S | 206 ++++++++++++++++++++++++++++ arch/arm/lib32/memset.S | 96 ++++++++----- arch/arm/lib32/runtime-offset.S | 2 +- arch/arm/lib64/copy_template.S | 180 ------------------------- arch/arm/lib64/memcpy.S | 274 ++++++++++++++++++++++++++++++++------ arch/arm/lib64/memset.S | 18 ++- arch/arm/lib64/string.c | 17 +++ include/string.h | 2 + lib/string.c | 11 +- 36 files changed, 940 insertions(+), 701 deletions(-) --- base-commit: 419ea9350aa083d4a2806a70132129a49a5ecf95 change-id: 20240925-arm-assembly-memmove-8eccb9affa1b Best regards, -- Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>