On Wed, Feb 18, 2015 at 09:40:23AM +0000, David Laight wrote: > From: Karl Beldan > > On Tue, Feb 17, 2015 at 12:04:22PM +0000, David Laight wrote: > > > > +static inline u32 from64to32(u64 x) > > > > +{ > > > > + /* add up 32-bit and 32-bit for 32+c bit */ > > > > + x = (x & 0xffffffff) + (x >> 32); > > > > + /* add up carry.. */ > > > > + x = (x & 0xffffffff) + (x >> 32); > > > > + return (u32)x; > > > > +} > > > > > > As a matter of interest, does the compiler optimise away the > > > second (x & 0xffffffff) ? > > > The code could just be: > > > x = (x & 0xffffffff) + (x >> 32); > > > return x + (x >> 32); > > > > > > > On my side, from what I've seen so far, your version results in better > > assembly, esp. with clang, but my first version > > http://article.gmane.org/gmane.linux.kernel/1875407: > > x += (x << 32) + (x >> 32); > > return (__force __wsum)(x >> 32); > > resulted in even better assembly, I just verified with gcc/clang, > > x86_64/ARM and -O1,2,3. > > The latter looks to have a shorter dependency chain as well. > Although I'd definitely include a comment saying that it is equivalent > to the two lines in the current patch. > > Does either compiler manage to use a rotate for the two shifts? > Using '(x << 32) | (x >> 32)' might convince it to do so. > That would reduce it to three 'real' instructions and a register rename. > gcc and clang rotate for tile (just checked gcc) and x86_64, not for arm (and IMHO rightly so). Both '|' and '+' yielded the same asm for those 3 archs. Karl -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html