Re: [PATCH v8 2/2] lib: checksum: Use aligned accesses for ip_fast_csum and csum_ipv6_magic tests

Helge Deller <deller@xxxxxx> · Fri, 16 Feb 2024 06:54:55 +0100

On 2/15/24 02:58, Guenter Roeck wrote:
Hi Charlie,

On 2/14/24 17:30, Charlie Jenkins wrote:
On Wed, Feb 14, 2024 at 03:03:07PM -0800, Guenter Roeck wrote:
On 2/14/24 13:41, Charlie Jenkins wrote:
The test cases for ip_fast_csum and csum_ipv6_magic were failing on a
variety of architectures that are big endian or do not support
misalgined accesses. Both of these test cases are changed to support big
and little endian architectures.

The test for ip_fast_csum is changed to align the data along (14 +
NET_IP_ALIGN) bytes which is the alignment of an IP header. The test for
csum_ipv6_magic aligns the data using a struct. An extra padding field
is added to the struct to ensure that the size of the struct is the same
on all architectures (44 bytes).

The test for csum_ipv6_magic somewhat arbitrarily aligned saddr and
daddr. This would fail on parisc64 due to the following code snippet in
arch/parisc/include/asm/checksum.h:

add        %4, %0, %0\n"
ldd,ma        8(%1), %6\n"
ldd,ma        8(%2), %7\n"
add,dc        %5, %0, %0\n"

The second add is expecting carry flags from the first add. Normally,
a double word load (ldd) does not modify the carry flags. However,
because saddr and daddr may be misaligned, ldd triggers a misalignment
trap that gets handled in arch/parisc/kernel/unaligned.c. This causes
many additional instructions to be executed between the two adds. This
can be easily solved by adding the carry into %0 before executing the
ldd.

I really think this is a bug either in the trap handler or in the hppa64
qemu emulation. Only unaligned ldd instructions affect (actually,
unconditionally set) the carry flag. That doesn't happen with unaligned
ldw instructions. It would be worthwhile tracking this down since there are
lots of unaligned data accesses (8-byte accesses on 4-byte aligned addresses)
when running the kernel in 64-bit mode. On the other side, I guess this
is a different problem. Not sure though if that should even be mentioned
here since that makes it sound as if it would be expected that such
accesses impact the carry flag.

I wasn't confident it was a bug somewhere so that's why I sent this patch.

However, I have just found the section of the processor manual [1] I was
looking for (Section Privileged Software-Accessible Registers subsection
Processor Status Word (PSW)):

"Processor state is encoded in a 64-bit register called the Processor
Status Word (PSW). When an interruption occurs, the current value of the
PSW is saved in the Interruption Processor Status Word (IPSW) and
usually all defined PSW bits are set to 0.

"The PSW is set to the contents of the IPSW by the RETURN FROM
INTERRUPTION instruction. The interruption handler may restore the
original PSW, modify selected bits, or may change the PSW to an entirely
new value."

Stored in the PSW register are the "Carry/borrow bits". This confirms
that the carry/borrow bits should be restored. The save is supposed to
automatically happen upon an interrupt and restored by the RETURN FROM
INTERRUPTION, thus this is a QEMU bug and not a Linux bug (please
correct me if I am wrong).

I know that much (I looked into the manual as well), I just really don't
know if this is a Linux bug or a QEMU bug, and I have not been able to
nail it down. I think someone with access to hardware will need to confirm.

Specifically: Yes, the carry/borrow bits should be restored. Question is
if the Linux kernel's interrupt handler doesn't restore the carry bits
or if the problem is on the qemu side.

This v8 was not needed after-all it seems. It would be best to stick
with the v7.

I tend to agree; after all, v7 exposes the problem, making it easier to
determine if the problem can be reproduced on real hardware.

FWIW,I wrote some test code which exposes the problem.

Can you please give a pointer to this test code?
I'm happy to try it on real hardware.

It is quite easy to show that carry is always set after executing ldd
on an unaligned address. That is also why I know for sure that the
problem is not seen with ldw on unaligned addresses.
Interesting.
In general I think it's quite important to differentiate between
running on qemu or running on physical hardware.
Qemu just recently got 64-bit support, and it's not yet behaving
like real hardware. One thing I noticed is, that read hardware
does not seem to jump into the exception handler twice, while
qemu does. So, if you run into an exception (e.g. unaligned ldd)
then if a second exception happens in the fault handler (e.g. second
unaligned ldd to resolve wrongly-coded code lookup), you will
get different behaviour between hardware and emulation.
This is also the reason why qemu still fails to emulate newer
64-bit Linux kernels which uses kernel modules.

Helge