RE: FIO log averaging issue with --write_bw_log and --write_iops_log using > 1000 log_avg_msec values

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sitsofe,
Looks like this is handled in commit be6bb2b72608d7efbec13d06c67446e229136afa. 
I am no longer able to reproduce the issue with latest version of FIO 2.17, with that change as usecs will now be 0.  Could this potentially cause some timing issues?

Prior to that commit...
The job below would result in bw_logs and iops_log that have skewed results both the add_iops_samples and add_bw_samples makes a call to mtime_since with start time as the _sample_time 
./stat.c:2435:  spent = mtime_since(&td->bw_sample_time, t)
./stat.c:2510:  spent = mtime_since(&td->iops_sample_time, t);

./stat.c:2473:  timeval_add_msec(&td->bw_sample_time, td->o.bw_avg_time);
./stat.c:2548:  timeval_add_msec(&td->iops_sample_time, td->o.iops_avg_time);

stat.c:2480:27: runtime error: unsigned integer overflow: 2000 - 18446744073708449 cannot be represented in type 'unsigned long'
stat.c:2555:29: runtime error: unsigned integer overflow: 2000 - 18446744073708449 cannot be represented in type 'unsigned long'

The overflow at stat.c:2265 is a different issue possibly a rounding problem with msec and usecs. 


Thanks,
Chris

On 23 February 2017 at 20:55, Chris Taylor <ctaylor@xxxxxxxxxx> wrote:
> There is an issue in FIO when using the detailed bandwidth and iops logging with averaging over a period of time > 1 second.
> It seems that usecs overflows which later causes negative time diff values resulting in skewed toward 0 results.
> I have attached a potential fix that should prevent usecs from going beyond 1000000.

Is this problem shown by the following job:

./fio --ioengine=null --write_bw_log=/dev/null --log_avg_msec=2000 --size=100M --runtime=10s --time_based --name=overflow

Among Clang's undefined behaviour sanitizer output was this for the above:
stat.c:2265:43: runtime error: unsigned integer overflow: 1999 - 2000 cannot be represented in type 'unsigned long'
stat.c:2265:28: runtime error: unsigned integer overflow: 3999 -
4294967295 cannot be represented in type 'unsigned long'

> diff --git a/time.c b/time.c
> index f5dc049..c748bee 100644
> --- a/time.c
> +++ b/time.c
> @@ -8,11 +8,18 @@ static unsigned long ns_granularity;
>
> void timeval_add_msec(struct timeval *tv, unsigned int msec) {
> -       tv->tv_usec += 1000 * msec;
> -       if (tv->tv_usec >= 1000000) {
> -               tv->tv_usec -= 1000000;
> -               tv->tv_sec++;
> -       }
> +       int adj_usec = 1000 * msec;
> +       int adj_sec = 0;
> +       tv->tv_usec +=  adj_usec;
> +       if (adj_usec >= 1000000) {
> +               adj_sec = adj_usec / 1000000;
> +               tv->tv_usec -=  adj_sec * 1000000;
> +               tv->tv_sec += adj_sec;
> +    }
> +    if (tv->tv_usec >= 1000000){
> +            tv->tv_usec -= 1000000;
> +            tv->tv_sec++;
> +    }
> }

Is it still safe to use int if you're targeting a 32 bit platform?
--
Sitsofe | http://sucs.org/~sits/
��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�

[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux