On Wed, 17 Sep 2008 11:40:01 +0100 (BST), "Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx> wrote: > > and it seems to fix the problem for me. Can you comment? > > It seems obvious that a carry from the bit #15 in the last addition of > the passed checksum -- ADDC(sum, a2) -- will negate the effect of the > folding. However a simpler fix should do as well. Try if the following > patch works for you. Please note this is completely untested and further > optimisation is possible, but I've skipped it in this version for clarity. Well, csum_partial()'s return value is __wsum (32-bit). So strictly there is no need to folding into 16-bit. I think this bug was introduced by my commit ed99e2bc1dc5dc54eb5a019f4975562dbef20103 ("[MIPS] Optimize csum_partial for 64bit kernel"). This commit changed ADDC to using daddu for 64-bit kernel and that was wrong for the last addition of partial csum which should be 32-bit addition. Here is my proposal fix. Bryan, could you try it too? diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S index 8d77841..8b15766 100644 --- a/arch/mips/lib/csum_partial.S +++ b/arch/mips/lib/csum_partial.S @@ -60,6 +60,14 @@ ADD sum, v1; \ .set pop +#define ADDC32(sum,reg) \ + .set push; \ + .set noat; \ + addu sum, reg; \ + sltu v1, sum, reg; \ + addu sum, v1; \ + .set pop + #define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \ LOAD _t0, (offset + UNIT(0))(src); \ LOAD _t1, (offset + UNIT(1))(src); \ @@ -280,7 +288,7 @@ LEAF(csum_partial) 1: .set reorder /* Add the passed partial csum. */ - ADDC(sum, a2) + ADDC32(sum, a2) jr ra .set noreorder END(csum_partial) @@ -681,7 +689,7 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc) .set pop 1: .set reorder - ADDC(sum, psum) + ADDC32(sum, psum) jr ra .set noreorder