On Mon, Mar 4, 2024, at 12:39, Christophe Leroy wrote: > Le 03/03/2024 à 16:26, Guenter Roeck a écrit : >> On 3/3/24 02:20, Christophe Leroy wrote: > > I don't know much about ARM instruction set, seems like the ldr > instruction used in ip_fast_csum() doesn't mind unaligned accesses while > ldmia instruction used in csum_ipv6_magic() minds. Or is it a wrong > behaviour of QEMU ? Correct. On ARMv6 and newer, accessing normal unaligned memory with ldr/str does not trap, and that covers most unaligned accesses. Some of the cases that don't allow unaligned access include: - ARMv4/ARMv5 cannot access unaligned memory with the same instructions. Apparently the same is true for ARMv7-M. - multi-word accesses (ldrd/strd and ldm/stm) require 32-bit alignment. These are generated for most 64-bit variables and some arrays - unaligned access on MMIO registers (__iomem pointers) always trap - atomic access (ldrex/strex) requires aligned data - The C standard disallows casting to a type with larger alignment requirements, and gcc is known to produce code that doesn't work with this (and other) undefined behavior. > If I change the test as follows to only use word aligned IPv6 addresses, > it works: > > diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c > index 225bb7701460..4d86fc8ccd78 100644 > --- a/lib/checksum_kunit.c > +++ b/lib/checksum_kunit.c > @@ -607,7 +607,7 @@ static void test_csum_ipv6_magic(struct kunit *test) > const int csum_offset = sizeof(struct in6_addr) + sizeof(struct > in6_addr) + > sizeof(int) + sizeof(char); > > - for (int i = 0; i < NUM_IPv6_TESTS; i++) { > + for (int i = 0; i < NUM_IPv6_TESTS; i += 4) { > saddr = (const struct in6_addr *)(random_buf + i); > daddr = (const struct in6_addr *)(random_buf + i + > daddr_offset); > > > If I change csum_ipv6_magic() as follows to use instruction ldr instead > of ldmia, it also works without any change to the test: > > diff --git a/arch/arm/lib/csumipv6.S b/arch/arm/lib/csumipv6.S > index 3559d515144c..a312d0836b95 100644 > --- a/arch/arm/lib/csumipv6.S > +++ b/arch/arm/lib/csumipv6.S > @@ -12,12 +12,18 @@ > ENTRY(__csum_ipv6_magic) > str lr, [sp, #-4]! > adds ip, r2, r3 > - ldmia r1, {r1 - r3, lr} > + ldr r2, [r1], #4 > + ldr r3, [r1], #4 > + ldr lr, [r1], #4 > + ldr r1, [r1] > > So now we are back to the initial question, should checksumming on > unaligned addresses be supported or not ? > > Russell I understand from previous answers from you that half-word > alignment should be supported, in that case should ARM version of > csum_ipv6_magic() be modified ? In that case can you propose the most > optimised fix ? The csumipv6.S code predates ARMv6 and is indeed suboptimal on v6/v7 processors with unaligned ipv6 headers. Your workaround looks like it should be much better, but it would at the same time make the ARMv5 case much more expensive because it traps four times instead of just one. > If not, then the test has to be fixed to only use word-aligned IPv6 > addresses. Because of the gcc issue I mentioned, net/ipv6/ip6_checksum.c and anything else that accesses misaligned ipv6 headers may need to be changed as well. Marking in6_addr as '__packed __aligned(2)' should be sufficient for that. This will prevent gcc from issuing ldm or ldrd on ARMv6+ as well as making optimization based on the two lower bits of the address being zero on x86 and others. The downside is that it forces 16-bit loads and stores to be used on architectures that don't have efficient unaligned access (armv5, alpha, mips, sparc and xtensa among others) even when the IP headers are fully aligned. Arnd