Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

"Arnd Bergmann" <arnd@xxxxxxxx> · Mon, 04 Mar 2024 14:39:59 +0100

On Mon, Mar 4, 2024, at 12:39, Christophe Leroy wrote:
> Le 03/03/2024 à 16:26, Guenter Roeck a écrit :
>> On 3/3/24 02:20, Christophe Leroy wrote:
>
> I don't know much about ARM instruction set, seems like the ldr 
> instruction used in ip_fast_csum() doesn't mind unaligned accesses while 
> ldmia instruction used in csum_ipv6_magic() minds. Or is it a wrong 
> behaviour of QEMU ?

Correct.

On ARMv6 and newer, accessing normal unaligned memory with ldr/str
does not trap, and that covers most unaligned accesses.

Some of the cases that don't allow unaligned access include:

- ARMv4/ARMv5 cannot access unaligned memory with the same
  instructions. Apparently the same is true for ARMv7-M.

- multi-word accesses (ldrd/strd and ldm/stm) require 32-bit
  alignment. These are generated for most 64-bit variables
  and some arrays

- unaligned access on MMIO registers (__iomem pointers)
  always trap

- atomic access (ldrex/strex) requires aligned data

- The C standard disallows casting to a type with larger
  alignment requirements, and gcc is known to produce
  code that doesn't work with this (and other) undefined
  behavior.

> If I change the test as follows to only use word aligned IPv6 addresses, 
> it works:
>
> diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
> index 225bb7701460..4d86fc8ccd78 100644
> --- a/lib/checksum_kunit.c
> +++ b/lib/checksum_kunit.c
> @@ -607,7 +607,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
>   	const int csum_offset = sizeof(struct in6_addr) + sizeof(struct 
> in6_addr) +
>   			    sizeof(int) + sizeof(char);
>
> -	for (int i = 0; i < NUM_IPv6_TESTS; i++) {
> +	for (int i = 0; i < NUM_IPv6_TESTS; i += 4) {
>   		saddr = (const struct in6_addr *)(random_buf + i);
>   		daddr = (const struct in6_addr *)(random_buf + i +
>   						  daddr_offset);
>
>
> If I change csum_ipv6_magic() as follows to use instruction ldr instead 
> of ldmia, it also works without any change to the test:
>
> diff --git a/arch/arm/lib/csumipv6.S b/arch/arm/lib/csumipv6.S
> index 3559d515144c..a312d0836b95 100644
> --- a/arch/arm/lib/csumipv6.S
> +++ b/arch/arm/lib/csumipv6.S
> @@ -12,12 +12,18 @@
>   ENTRY(__csum_ipv6_magic)
>   		str	lr, [sp, #-4]!
>   		adds	ip, r2, r3
> -		ldmia	r1, {r1 - r3, lr}
> +		ldr	r2, [r1], #4
> +		ldr	r3, [r1], #4
> +		ldr	lr, [r1], #4
> +		ldr	r1, [r1]
>
> So now we are back to the initial question, should checksumming on 
> unaligned addresses be supported or not ?
>
> Russell I understand from previous answers from you that half-word 
> alignment should be supported, in that case should ARM version of 
> csum_ipv6_magic() be modified ? In that case can you propose the most 
> optimised fix ?

The csumipv6.S code predates ARMv6 and is indeed suboptimal on v6/v7
processors with unaligned ipv6 headers. Your workaround looks like
it should be much better, but it would at the same time make the
ARMv5 case much more expensive because it traps four times instead
of just one.

> If not, then the test has to be fixed to only use word-aligned IPv6 
> addresses.

Because of the gcc issue I mentioned, net/ipv6/ip6_checksum.c
and anything else that accesses misaligned ipv6 headers may need
to be changed as well. Marking in6_addr as '__packed __aligned(2)'
should be sufficient for that. This will prevent gcc from issuing
ldm or ldrd on ARMv6+ as well as making optimization based on
the two lower bits of the address being zero on x86 and others.
The downside is that it forces 16-bit loads and stores to be
used on architectures that don't have efficient unaligned
access (armv5, alpha, mips, sparc and xtensa among others)
even when the IP headers are fully aligned.

     Arnd