On Wed, 8 Feb 2017, Andy Lutomirski wrote: > > +#ifdef CONFIG_HYPERV_CLOCK > > +/* (a * b) >> 64 implementation */ > > +static inline u64 mul64x64_hi(u64 a, u64 b) > > +{ > > + u64 a_lo, a_hi, b_lo, b_hi, p1, p2; > > + > > + a_lo = (u32)a; > > + a_hi = a >> 32; > > + b_lo = (u32)b; > > + b_hi = b >> 32; > > + p1 = a_lo * b_hi; > > + p2 = a_hi * b_lo; > > + > > + return a_hi * b_hi + (p1 >> 32) + (p2 >> 32) + > > + ((((a_lo * b_lo) >> 32) + (u32)p1 + (u32)p2) >> 32); > > + > > +} > > Unless GCC is waaay more clever than I think, this is hugely > suboptimal on 64-bit. x86 can do this in a single instruction, and > gcc can express it cleanly using __uint128_t. I wouldn't be terribly > surprised if the 32-bit generated code was fine, too. We already have that: mul_u64_u64_shr() which can be replaced by an arch specific implementation. Nobody bothered to do that for x86 yet, but we definitely don't want to open code it another time Thanks, tglx _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel