Re: [PATCH v11] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Christophe Leroy <christophe.leroy@xxxxxxxxxx> · Mon, 4 Mar 2024 11:39:02 +0000

Hi Russell and Guenter,

Le 03/03/2024 à 16:26, Guenter Roeck a écrit :
> On 3/3/24 02:20, Christophe Leroy wrote:
>>
>>
>> Le 01/03/2024 à 19:32, Guenter Roeck a écrit :
>>> This leaves the mps2-an385:mps2_defconfig crash, which is avoided by
>>> this patch.
>>> My understanding, which may be wrong, is that arm images with thumb
>>> instructions
>>> do not support unaligned accesses (maybe I should say do not support
>>> unaligned
>>> accesses with the mps2-an385 qemu emulation; I did not test with real
>>> hardware,
>>> after all).

...

>>
>> Can you tell how to proceed ?
>>
> 
> You can't run it directly. mps2-an385 is one of the platforms where
> the qemu maintainers insisted that qemu shall not initialize the CPU.
> You have to provide a shim such as
> https://github.com/groeck/linux-build-test/blob/master/rootfs/arm/mps2-boot.axf
> as bios. You also have to provide the dtb file.
> 
> On top of that, you would need a customized version of qemu which
> actually reads the command line, the bios file, and the dtb. See
> https://github.com/groeck/linux-build-test/tree/master/qemu
> branch v8.2.1-local or v8.1.5-local.
> 

Many thanks for your guidance. So, I did the test and what I can say:

ip_fast_csum() works whatever the alignment is.

csum_ipv6_magic() is the problem with unaligned ipv6 source or 
destination addresses:

[    0.503757] KTAP version 1
[    0.503854] 1..1
[    0.504156]     KTAP version 1
[    0.504251]     # Subtest: checksum
[    0.504563]     # module: checksum_kunit
[    0.504730]     1..5
[    0.546418]     ok 1 test_csum_fixed_random_inputs
[    0.627853]     ok 2 test_csum_all_carry_inputs
[    0.704918]     ok 3 test_csum_no_carry_inputs
[    0.705845]     ok 4 test_ip_fast_csum
[    0.706320]
[    0.706320] Unhandled exception: IPSR = 00000006 LR = fffffff1
[    0.706796] CPU: 0 PID: 28 Comm: kunit_try_catch Tainted: G 
       N 6.8.0-rc1-00609-g9c0b7a2e25f0 #649
[    0.707177] Hardware name: Generic DT based system
[    0.707400] PC is at __csum_ipv6_magic+0x8/0xb4
[    0.708170] LR is at test_csum_ipv6_magic+0x3d/0xa4
[    0.708415] pc : [<211b0da8>]    lr : [<210e3bf5>]    psr: 0100020b
[    0.708692] sp : 2153debc  ip : 46c7f0d2  fp : 00000000
[    0.708919] r10: 00000000  r9 : 2141dc48  r8 : 211e0e20
[    0.709148] r7 : 00003085  r6 : 00000001  r5 : 2141dd24  r4 : 211e0c2e
[    0.709422] r3 : 2c000000  r2 : 1ac7f0d2  r1 : 211e0c19  r0 : 211e0c09
[    0.709704] xPSR: 0100020b


I don't know much about ARM instruction set, seems like the ldr 
instruction used in ip_fast_csum() doesn't mind unaligned accesses while 
ldmia instruction used in csum_ipv6_magic() minds. Or is it a wrong 
behaviour of QEMU ?

If I change the test as follows to only use word aligned IPv6 addresses, 
it works:

diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 225bb7701460..4d86fc8ccd78 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -607,7 +607,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
  	const int csum_offset = sizeof(struct in6_addr) + sizeof(struct 
in6_addr) +
  			    sizeof(int) + sizeof(char);

-	for (int i = 0; i < NUM_IPv6_TESTS; i++) {
+	for (int i = 0; i < NUM_IPv6_TESTS; i += 4) {
  		saddr = (const struct in6_addr *)(random_buf + i);
  		daddr = (const struct in6_addr *)(random_buf + i +
  						  daddr_offset);


If I change csum_ipv6_magic() as follows to use instruction ldr instead 
of ldmia, it also works without any change to the test:

diff --git a/arch/arm/lib/csumipv6.S b/arch/arm/lib/csumipv6.S
index 3559d515144c..a312d0836b95 100644
--- a/arch/arm/lib/csumipv6.S
+++ b/arch/arm/lib/csumipv6.S
@@ -12,12 +12,18 @@
  ENTRY(__csum_ipv6_magic)
  		str	lr, [sp, #-4]!
  		adds	ip, r2, r3
-		ldmia	r1, {r1 - r3, lr}
+		ldr	r2, [r1], #4
+		ldr	r3, [r1], #4
+		ldr	lr, [r1], #4
+		ldr	r1, [r1]
  		adcs	ip, ip, r1
  		adcs	ip, ip, r2
  		adcs	ip, ip, r3
  		adcs	ip, ip, lr
-		ldmia	r0, {r0 - r3}
+		ldr	r1, [r0], #4
+		ldr	r2, [r0], #4
+		ldr	r3, [r0], #4
+		ldr	r0, [r0]
  		adcs	r0, ip, r0
  		adcs	r0, r0, r1
  		adcs	r0, r0, r2


So now we are back to the initial question, should checksumming on 
unaligned addresses be supported or not ?

Russell I understand from previous answers from you that half-word 
alignment should be supported, in that case should ARM version of 
csum_ipv6_magic() be modified ? In that case can you propose the most 
optimised fix ?

If not, then the test has to be fixed to only use word-aligned IPv6 
addresses.

Thanks
Christophe