On Sat, Apr 17, 2021 at 04:24:43PM +0200, Peter Zijlstra wrote: > On Sat, Apr 17, 2021 at 01:46:23PM +0200, Willy Tarreau wrote: > > For me the old trick of casting one side as long long still works: > > > > unsigned long long mul3264(unsigned int a, unsigned int b) > > { > > return (unsigned long long)a * b; > > } > > > > i386: > > 00000000 <mul3264>: > > 0: 8b 44 24 08 mov 0x8(%esp),%eax > > 4: f7 64 24 04 mull 0x4(%esp) > > 8: c3 ret > > > > x86_64: > > 0000000000000000 <mul3264>: > > 0: 89 f8 mov %edi,%eax > > 2: 89 f7 mov %esi,%edi > > 4: 48 0f af c7 imul %rdi,%rax > > 8: c3 retq > > > > Or maybe you had something else in mind ? > > Last time I tried it, the thing refused :/ which is how we ended up with > mul_u32_u32() in asm. Oh I trust you, I do remember having noticed it on one gcc version as well (maybe 4.5). But I've been successfully using this since 2.95, and could quickly recheck that 4.7, 4.8, 5.4, 6.5, 7.4, 9.3 and 11-trunk do produce the code above, which is reassuring, as we all prefer to limit the amount of asm statements. Willy