From: Karl Beldan > On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote: > > > +static inline u32 from64to32(u64 x) > > > +{ > > > + /* add up 32-bit and 32-bit for 32+c bit */ > > > + x = (x & 0xffffffff) + (x >> 32); > > > + /* add up carry.. */ > > > + x = (x & 0xffffffff) + (x >> 32); > > > + return (u32)x; > > > +} > > > > As a matter of interest, does the compiler optimise away the > > second (x & 0xffffffff) ? > > The code could just be: > > x = (x & 0xffffffff) + (x >> 32); > > return x + (x >> 32); > > > > On my side, from what I've seen so far, your version results in better > assembly, esp. with clang, but my first version > http://article.gmane.org/gmane.linux.kernel/1875407: > x += (x << 32) + (x >> 32); > return (__force __wsum)(x >> 32); > resulted in even better assembly, I just verified with gcc/clang, > x86_64/ARM and -O1,2,3. The latter looks to have a shorter dependency chain as well. Although I'd definitely include a comment saying that it is equivalent to the two lines in the current patch. Does either compiler manage to use a rotate for the two shifts? Using '(x << 32) | (x >> 32)' might convince it to do so. That would reduce it to three 'real' instructions and a register rename. David ��.n��������+%������w��{.n�����������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f