On 2012-12-21 16:33, Jens Axboe wrote: > On 2012-12-20 20:23, Sam Bradshaw wrote: >> >>>> diff --git a/gettime.c b/gettime.c >>>> index 035d275..89f3e27 100644 >>>> --- a/gettime.c >>>> +++ b/gettime.c >>>> @@ -168,17 +168,23 @@ void fio_gettime(struct timeval *tp, void >>>> fio_unused *caller) >>>> } >>>> #ifdef ARCH_HAVE_CPU_CLOCK >>>> case CS_CPUCLOCK: { >>>> - unsigned long long usecs, t; >>>> + unsigned long long usecs, t, delta = 0; >>>> >>>> t = get_cpu_clock(); >>>> if (tv && t < tv->last_cycles) { >>>> dprint(FD_TIME, "CPU clock going back in time\n"); >>>> t = tv->last_cycles; >>>> - } else if (tv) >>>> + } else if (tv) { >>>> + if (tv->last_tv_valid) >>>> + delta = t - tv->last_cycles; >>>> tv->last_cycles = t; >>>> + } >>>> >>>> usecs = t / cycles_per_usec; >>>> - tp->tv_sec = usecs / 1000000; >>>> + if (delta && delta < 1000000) >>>> + tp->tv_sec = tv->last_tv.tv_sec; >>>> + else >>>> + tp->tv_sec = usecs / 1000000; >>>> tp->tv_usec = usecs % 1000000; >>>> break; >>>> } >>> >>> I was thinking about this... Is it actually guarenteed to work. If >>> tv->last_tv.tv_usec is eg 900,000, you'd only need a 100k usec diff to >>> need to wrap, not 1000k. And since this is about avoiding costly divs, >>> since we know the number of cycles last time, it might make more sense >>> to just do the single div to go from cycles to usecs, then add that to >>> the tv->last_tv. >>> >> >> >> >> Something like this might work, though that amount of logic may >> be equivalent in terms of cycles to the divide. > > So I took a look at it. The costly bit is the division by > cycles_per_usec, which the compiler has no other option than turn into a > divq. The modulo and divide by 1M can be turned into something more > clever, basically shifts and imull. > > So how about the below? It turns the divq into multiplication and > division by 10M, which should be considerably less expensive. Can you > test and see how that works for you? Actually, it'd be dumb not to make it a power-of-2, since the actual number doesn't really matter. So this uses 2^24, try that. diff --git a/gettime.c b/gettime.c index 035d275..df329f6 100644 --- a/gettime.c +++ b/gettime.c @@ -15,6 +15,7 @@ #ifdef ARCH_HAVE_CPU_CLOCK static unsigned long cycles_per_usec; +static unsigned long inv_cycles_per_usec; int tsc_reliable = 0; #endif @@ -177,7 +178,7 @@ void fio_gettime(struct timeval *tp, void fio_unused *caller) } else if (tv) tv->last_cycles = t; - usecs = t / cycles_per_usec; + usecs = (t * inv_cycles_per_usec) / 16777216UL; tp->tv_sec = usecs / 1000000; tp->tv_usec = usecs % 1000000; break; @@ -277,6 +278,8 @@ static void calibrate_cpu_clock(void) dprint(FD_TIME, "mean=%f, S=%f\n", mean, S); cycles_per_usec = avg; + inv_cycles_per_usec = 16777216UL / cycles_per_usec; + dprint(FD_TIME, "inv_cycles_per_usec=%lu\n", inv_cycles_per_usec); } #else static void calibrate_cpu_clock(void) -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html