On Tue, 2012-04-24 at 12:37 -0700, Linus Torvalds wrote: > Also, it might be worth looking at code generation, to see if it's > better to just do > > a.hi += b.hi; > a.low += b.low; > if (a.low < b.low) > a.hi++; > return a; > > because that might make it clear that there are fewer actual values > live at any particular time. But gcc may not care. Try it. It does indeed generate tons better code. FWIW, Mans' suggestion of: a.hi += a.lo < b.lo; horribly confuses gcc. > Also, for the multiply, please make sure gcc knows to do a "32x32->64" > multiplication, rather than thinking it needs to do full 64x64 > multiplies.. > > I'm not sure gcc understands that as you wrote it. It does indeed grok it (as Mans also confirmed for ARM), however: > You are probably > better off actually using 32-bit values, and then an explicit cast, ie > > u32 a32_0 = .. low 32 bits of a .. > u32 b32_0 = .. low 32 bits of b .. > u64 res64_0 = (u64) a32_0 * (u64) b32_0; > > but if gcc understands it from the shifts and masks, I guess it doesn't matter. that does generate slightly better code in that it avoids some masks on 64bit: @@ -7,12 +7,11 @@ .LFB38: .cfi_startproc movq %rdi, %r8 - movq %rdi, %rdx movq %rsi, %rcx + mov %edi, %edx shrq $32, %r8 - andl $4294967295, %edx shrq $32, %rcx - andl $4294967295, %esi + mov %esi, %esi movq %rcx, %rax imulq %rdx, %rcx imulq %rsi, %rdx -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html