On Tue, Apr 24, 2012 at 2:32 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote: > On Tue, 2012-04-24 at 14:15 -0700, Andy Lutomirski wrote: >> > The second two implement a few u128 operations so we can do 128bit math.. I >> > know a few people will die a little inside, but having nanosecond granularity >> > time accounting leads to very big numbers very quickly and when you need to >> > multiply them 64bit really isn't that much. >> >> I played with some of this stuff awhile ago, and for timekeeping, it >> seemed like a 64x32->96 bit multiply followed by a right shift was >> enough, and that operation is a lot faster on 32-bit architectures than >> a full 64x64->128 multiply. > > The SCHED_DEADLINE use case is not that, it multiplies two time > intervals. Basically it needs to evaluate if a task activation still > fits in the old period or if it needs to shift the deadline and start a > new period. > > It needs to do: runtime / (deadline - t) < budget / period > which transforms into: (deadline - t) * period < budget * runtime > > hence the 64x64->128 mult and 128 compare. Fair enough. > >> Something like: >> >> uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift) >> { >> return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift ); >> } > > That looks a lot like what we grew mult_frac() for, it does: > > /* > * Multiplies an integer by a fraction, while avoiding unnecessary > * overflow or loss of precision. > */ > #define mult_frac(x, numer, denom)( \ > { \ > typeof(x) quot = (x) / (denom); \ > typeof(x) rem = (x) % (denom); \ > (quot * (numer)) + ((rem * (numer)) / (denom)); \ > } \ > ) > > > and is used in __cycles_2_ns() and friends. Yeesh. That looks way slower, and IIRC __cycles_2_ns overflows every few seconds on modern machines. gcc 4.6 generates this code: mul_64_32_shift: pushq %rbp movq %rsp, %rbp movl %edx, %ecx movl %esi, %eax mulq %rdi movq %rdx, %rsi shrq %cl, %rsi shrdq %cl, %rdx, %rax testb $64, %cl cmovneq %rsi, %rax popq %rbp ret which is a bit dumb if you can make assumptions about the shift. See http://gcc.gnu.org/PR46514. Some use cases might be able to guarantee that the shift is less than 32 bits, in which case hand-written assembly would be a few cycles faster. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html