On Tue, 16 Sep 2008, Bryan Phillippe wrote: > I've experimented with the following change: > > --- /home/bp/tmp/csum_partial.S.orig 2008-09-16 12:01:00.000000000 -0700 > +++ arch/mips/lib/csum_partial.S 2008-09-16 11:51:44.000000000 -0700 > @@ -281,6 +281,23 @@ > .set reorder > /* Add the passed partial csum. */ > ADDC(sum, a2) > + > + /* fold checksum again to clear the high bits before returning */ > + .set push > + .set noat > +#ifdef USE_DOUBLE > + dsll32 v1, sum, 0 > + daddu sum, v1 > + sltu v1, sum, v1 > + dsra32 sum, sum, 0 > + addu sum, v1 > +#endif > + sll v1, sum, 16 > + addu sum, v1 > + sltu v1, sum, v1 > + srl sum, sum, 16 > + addu sum, v1 > + > jr ra > .set noreorder > END(csum_partial) > > and it seems to fix the problem for me. Can you comment? It seems obvious that a carry from the bit #15 in the last addition of the passed checksum -- ADDC(sum, a2) -- will negate the effect of the folding. However a simpler fix should do as well. Try if the following patch works for you. Please note this is completely untested and further optimisation is possible, but I've skipped it in this version for clarity. Thanks for raising the issue. Maciej Signed-off-by: Maciej W. Rozycki <macro@xxxxxxxxxxxxxx> --- a/arch/mips/lib/csum_partial.S 2008-05-05 02:55:23.000000000 +0000 +++ b/arch/mips/lib/csum_partial.S 2008-09-17 10:32:37.000000000 +0000 @@ -253,6 +253,9 @@ LEAF(csum_partial) 1: ADDC(sum, t1) + /* Add the passed partial csum. */ + ADDC(sum, a2) + /* fold checksum */ .set push .set noat @@ -278,11 +281,8 @@ LEAF(csum_partial) andi sum, 0xffff .set pop 1: - .set reorder - /* Add the passed partial csum. */ - ADDC(sum, a2) jr ra - .set noreorder + nop END(csum_partial)