Hello All,
I tested the simplified patch below (ADDC32), and it does not address
the checksum bug. I suspect the problem is that we're still leaving
the carry bit in the upper 16 bits of the 32 bit csum returned, and
this is resulting in a computed checksum that is 1 greater than it
should be. The upper 16 bits of the return value of this function
must be 0, right?
Thanks,
--
-bp
On Sep 17, 2008, at 6:23 AM, Atsushi Nemoto wrote:
On Wed, 17 Sep 2008 11:40:01 +0100 (BST), "Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx
> wrote:
and it seems to fix the problem for me. Can you comment?
It seems obvious that a carry from the bit #15 in the last addition
of
the passed checksum -- ADDC(sum, a2) -- will negate the effect of the
folding. However a simpler fix should do as well. Try if the
following
patch works for you. Please note this is completely untested and
further
optimisation is possible, but I've skipped it in this version for
clarity.
Well, csum_partial()'s return value is __wsum (32-bit). So strictly
there is no need to folding into 16-bit.
I think this bug was introduced by my commit
ed99e2bc1dc5dc54eb5a019f4975562dbef20103 ("[MIPS] Optimize
csum_partial for 64bit kernel"). This commit changed ADDC to using
daddu for 64-bit kernel and that was wrong for the last addition of
partial csum which should be 32-bit addition.
Here is my proposal fix. Bryan, could you try it too?
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/
csum_partial.S
index 8d77841..8b15766 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -60,6 +60,14 @@
ADD sum, v1; \
.set pop
+#define ADDC32(sum,reg) \
+ .set push; \
+ .set noat; \
+ addu sum, reg; \
+ sltu v1, sum, reg; \
+ addu sum, v1; \
+ .set pop
+
#define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
LOAD _t0, (offset + UNIT(0))(src); \
LOAD _t1, (offset + UNIT(1))(src); \
@@ -280,7 +288,7 @@ LEAF(csum_partial)
1:
.set reorder
/* Add the passed partial csum. */
- ADDC(sum, a2)
+ ADDC32(sum, a2)
jr ra
.set noreorder
END(csum_partial)
@@ -681,7 +689,7 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
.set pop
1:
.set reorder
- ADDC(sum, psum)
+ ADDC32(sum, psum)
jr ra
.set noreorder