Hello all, I've been playing around with a multiprecision arithmetic library (actually, it's the Python programming language's implementation of long integer arithmetic), and the following question came up: Is there a reliable way to persuade gcc to produce a 32 x 32 -> 64 bit widening multiply on x86 and x86_64 platforms, without resorting to assembler? I tried compiling the following code on a 3rd gen. MacBook Pro (OS X 10.5.5, Core 2 Duo): /* begin file test.c */ #include <stdint.h> extern uint64_t digit_mul(uint32_t a, uint32_t b) { return (uint64_t)a * b; } /* end file test.c */ using: gcc-mp-4.3 -m64 -O3 -S test.c (here gcc-mp-4.3 is gcc version 4.3 from macports) and I was a little surprised by what I got: the assembly output test.s contained: .globl _digit_mul _digit_mul: LFB2: pushq %rbp LCFI0: mov %esi, %eax mov %edi, %edi imulq %rdi, %rax movq %rsp, %rbp LCFI1: leave ret LFE2: If I'm not mistaken, that imulq is a 64 x 64 -> 64 bit multiply; this seems inefficient, when a 32 x 32 -> 64 bit multiply ought to be good enough. Is there a good reason for having a 64-bit multiply here? Or is gcc just not in a good position to make this kind of optimization? Compiling in 32-bit mode gives me exactly what I'd expect: after "gcc-mp-4.3 -m32 -O3 -S test.c", test.s contains: .globl _digit_mul _digit_mul: pushl %ebp movl %esp, %ebp subl $8, %esp movl 12(%ebp), %eax mull 8(%ebp) leave ret Any insights would be appreciated! Mark