32 x 32 -> 64 bit widening multiply on x86_64

"Mark Dickinson" <dickinsm@xxxxxxxxx> · Tue, 11 Nov 2008 10:08:24 +0000

Hello all,

I've been playing around with a multiprecision arithmetic library
(actually, it's the Python programming language's implementation
of long integer arithmetic), and the following question came up:

Is there a reliable way to persuade gcc to produce a 32 x 32 -> 64 bit
widening multiply on x86 and x86_64 platforms, without resorting to
assembler?  I tried compiling the following code on a 3rd gen.
MacBook Pro (OS X 10.5.5, Core 2 Duo):

/* begin file test.c */
#include <stdint.h>
extern uint64_t
digit_mul(uint32_t a, uint32_t b) {
    return (uint64_t)a * b;
}
/* end file test.c */

using:

gcc-mp-4.3 -m64 -O3 -S test.c

(here gcc-mp-4.3 is gcc version 4.3 from macports) and I was a little
surprised by what I got:  the assembly output test.s contained:

.globl _digit_mul
_digit_mul:
LFB2:
        pushq   %rbp
LCFI0:
        mov     %esi, %eax
        mov     %edi, %edi
        imulq   %rdi, %rax
        movq    %rsp, %rbp
LCFI1:
        leave
        ret
LFE2:

If I'm not mistaken, that imulq is a 64 x 64 -> 64 bit multiply; this
seems inefficient, when a 32 x 32 -> 64 bit multiply ought to be
good enough.  Is there a good reason for having a 64-bit multiply
here?  Or is gcc just not in a good position to make this kind
of optimization?

Compiling in 32-bit mode gives me exactly what I'd expect:
after "gcc-mp-4.3 -m32 -O3 -S test.c", test.s contains:

.globl _digit_mul
_digit_mul:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        movl    12(%ebp), %eax
        mull    8(%ebp)
        leave
        ret

Any insights would be appreciated!

Mark