Re: FIO log averaging issue with --write_bw_log and --write_iops_log using > 1000 log_avg_msec values

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Sun, 2 Apr 2017 09:07:21 +0100

Hi Jens,

Chris has found that timing inaccuracy creeps when adding times
results in a two or more second overflow:

On 2 March 2017 at 21:52, Chris Taylor <ctaylor@xxxxxxxxxx> wrote:
> Tests pretty much prove we are getting close to what we expect:
> fio --eta=never --ioengine=libaio --readwrite=write --time_based=1 --runtime=60 --filename=/tmp/fio --size=1M --direct=1 --name=test --write_iops_log=/tmp/test --log_avg_msec=5500 --log_unix_epoch=1
> 1488489137632, 6413, 1, 0
> 1488489143131, 5503, 1, 0
> 1488489148631, 5585, 1, 0
> 1488489154131, 5883, 1, 0
> 1488489159631, 5726, 1, 0
> 1488489165131, 5331, 1, 0
> 1488489170631, 5772, 1, 0
> 1488489176131, 5661, 1, 0
> 1488489181631, 5970, 1, 0
> 1488489187131, 5776, 1, 0
> [5499, 5500, 5500, 5500, 5500, 5500, 5500, 5500, 5500]
>
> However, the problem still exists where in timeval_add_msec gets a value greater than 1000 msecs the tv->tv_usec it will grow beyond usecs (1000000) and causes unexpected behavior.  Possibly this is the better fix to avoid other callers having to adjust for  the overflow in timeval_add_msecs?\
> diff --git a/time.c b/time.c
> index f5dc049..a318ad5 100644
> --- a/time.c
> +++ b/time.c
> @@ -9,10 +9,9 @@ static unsigned long ns_granularity;
>  void timeval_add_msec(struct timeval *tv, unsigned int msec)
>  {
>         tv->tv_usec += 1000 * msec;
> -       if (tv->tv_usec >= 1000000) {
> -               tv->tv_usec -= 1000000;
> -               tv->tv_sec++;
> -       }
> +    tv->tv_sec += tv->tv_usec / 1000000;
> +    tv->tv_usec %= 1000000;
> +
>  }
>  /*
> Running the same job as above with the change:
> 1488489763099, 5773, 1, 0
> 1488489768599, 5852, 1, 0
> 1488489774099, 5654, 1, 0
> 1488489779599, 5482, 1, 0
> 1488489785099, 5650, 1, 0
> 1488489790599, 5752, 1, 0
> 1488489796099, 5390, 1, 0
> 1488489801599, 5746, 1, 0
> 1488489807099, 5610, 1, 0
> 1488489812599, 5361, 1, 0
> [5500, 5500, 5500, 5500, 5500, 5500, 5500, 5500, 5500]

Something like the proposed patch looks necessary because you need to
know how many seconds need to be deducted from tv->tv_usec and added
to tv->sec . The only snag I see is with the patch is the introduction
a division and a modulus (https://godbolt.org/g/xrhz7X suggests it's
the division that's most costly). I tried rearranging the code to look
like https://www.dovecot.org/list/dovecot-cvs/2014-September/024976.html
but that actually seemed slower.

Looking at some cachegrind results over on
https://gist.github.com/sitsofe/45fa767f72439cd1f8c148794382003b
suggests the existing code takes the branch 25% of the time and the
branch is mis-predicted around 25% of the time. Looking only at
instructions retired shows the patch doubles the number of
instructions that need to be retired but eliminates branch
mis-predictions. You could reintroduce the if thus skipping the extra
work 75% of the time but whether doing so is a win depends on the cost
of mis-predicting a branch. If you assume a mis-predict costs nothing
there's a difference of 3000 cycles over one minute and
since mis-predictions will reduce the difference I think the patch should
go in as is.

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html