Hi Alex, Thanks for your reply. I tested your correction. The result is correct. The C language will cause this function(csum_tcpudp_nofold) become 176 MIPS instructions. The assemble code is 150 MIPS instruction. If the MIPS CPU is running as 1GHz, C language cause more 60 nano seconds during send/receive each tcp/udp packet. I'm not sure whether it will cause any negative result if the frequency of CPU was lower. MIPS arch is usually used in networking equipments. I think Ralf's correction is better. - Mark Signed-off-by: Ralf Baechle <ralf@xxxxxxxxxxxxxx> arch/mips/include/asm/checksum.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h index 5c585c5..08b6ba3 100644 --- a/arch/mips/include/asm/checksum.h +++ b/arch/mips/include/asm/checksum.h @@ -186,7 +186,9 @@ static inline __wsum csum_tcpudp_nofold(__be32 saddr, " daddu %0, %4 \n" " dsll32 $1, %0, 0 \n" " daddu %0, $1 \n" + " sltu $1, %0, $1 \n" " dsra32 %0, %0, 0 \n" + " daddu %0, $1 \n" #endif " .set pop" : "=r" (sum) 2017-01-26 2:13 GMT+08:00 Alexander Duyck <alexander.duyck@xxxxxxxxx>: > On Tue, Jan 24, 2017 at 8:35 PM, Mark Zhang <bomb.zhang@xxxxxxxxx> wrote: >> If the input parameters as saddr = 0xc0a8fd60,daddr = 0xc0a8fda1,len = >> 80, proto = 17, sum =0x7eae049d. >> The correct result should be 1, but original function return 0. >> >> Attached the correction patch. > > I've copied your patch here: > > From 52e265f7fe0acf9a6e9c4346e1fe6fa994aa00b6 Mon Sep 17 00:00:00 2001 > From: qzhang <qin.2.zhang@xxxxxxx> > Date: Wed, 25 Jan 2017 12:25:25 +0800 > Subject: [PATCH] Fixed the mips 64bits checksum error -- csum_tcpudp_nofold > > --- > arch/mips/include/asm/checksum.h | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h > index 7749daf..0e351c5 100644 > --- a/arch/mips/include/asm/checksum.h > +++ b/arch/mips/include/asm/checksum.h > @@ -184,6 +184,10 @@ static inline __wsum csum_tcpudp_nofold(__be32 > saddr, __be32 daddr, > " daddu %0, %2 \n" > " daddu %0, %3 \n" > " daddu %0, %4 \n" > + " dsrl32 $1, %0, 0 \n" > + " dsll32 %0, %0, 0 \n" > + " dsrl32 %0, %0, 0 \n" > + " daddu %0, $1 \n" > " dsll32 $1, %0, 0 \n" > " daddu %0, $1 \n" > " dsra32 %0, %0, 0 \n" > > I agree there does appear to be a bug in the code, and my > understanding of MIPS assembly is limited, but I don't think your > patch truly fixes it. From what I can understand it seems like you > would just be shifting the register called out at %0 past 64 bits > which would just zero it out. > > Below is the snippet you are updating: > > #ifdef CONFIG_64BIT > " daddu %0, %2 \n" > " daddu %0, %3 \n" > " daddu %0, %4 \n" > " dsll32 $1, %0, 0 \n" > " daddu %0, $1 \n" > " dsra32 %0, %0, 0 \n" > #endif > > From what I can tell the issue is that the dsll32 really needs to be > replaced with some sort of rotation type call instead of a shift, but > it looks like MIPS doesn't have a rotation instruction. We need to > add the combination of a shift left by 32, to a shift right 32, and > then add that value back to the original register. Then we will get > the correct result in the upper 32 bits. > > I'm not even sure it is necessary to use inline assembler. You could > probably just use the inline assembler for the 32b and change the 64b > approach over to using C. The code for it would be pretty simple: > unsigned long res = (__force unsigned long)sum; > > res += (__force unsigned long)daddr; > res += (__force unsigned long)saddr; > #ifdef __MIPSEL__ > res += (proto + len) << 8; > #else > res += proto + len; > #endif > res += (res << 32) | (res >> 32); > > return (__force __wsum)(res >> 32); > > That would probably be enough to fix the issue and odds are it should > generate assembly code very similar to what was already there. > > - Alex