Re: Every 13 hours sees a spike in fio CPU consumption, and a drop in IOPS/throughput

Chris Worley <worleys@xxxxxxxxx> · Fri, 30 Oct 2009 10:49:48 -0600

On Fri, Oct 30, 2009 at 9:20 AM, Chris Worley <worleys@xxxxxxxxx> wrote:
> I'm running fio benchmarks for 120 hours:
>
> fio --rw=randrw --bs=4k --rwmixread=60 --numjobs=64 --iodepth=64
> --sync=0 --direct=1 --randrepeat=0 --ioengine=libaio
> --filename=/dev/sde --filename=/dev/sdf --name=test --loops=10000
> --size=322122547200 --runtime=432000 --group_reporting
>
> The fio threads are generally reporting 100K IOPS and each of the 64
> fio threads uses less than 10% CPU.
>
> But, every 13 hours (nearly to the minute), for ~200 seconds, the FIO
> threads start consuming large amounts of CPU, and the IOPS drop to
> ~40K:

Josh is brilliant:

(322122547200[bytes per test]*64[threads])/(4096[block
size]*100000[IOPS]*3600[seconds per hour])=13 hours

...it's restarting the next loop and recalculating the random maps
every 13 hours

Chris
>
> top - 07:36:59 up 8 days,  8:12,  0 users,  load average: 57.42,
> 58.54, 58.00Tasks: 352 total,  17 running, 335 sleeping,   0 stopped,
>  0 zombie
> Cpu(s):  1.2%us, 14.8%sy,  0.0%ni, 72.4%id, 10.1%wa,  1.3%hi,  0.2%si,
>  0.0%stMem:  49449752k total,  6643996k used, 42805756k free,
> 205832k buffers
> Swap:        0k total,        0k used,        0k free,  1432664k cached
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND22108 root      18   0 86144  10m 9948 R 80.6  0.0 137:55.21
> fio
> 22074 root      18   0 86144  10m 9952 R 70.8  0.0 138:07.68 fio
> 22101 root      18   0 86144  10m 9952 R 61.0  0.0 138:18.43 fio
> 22095 root      16   0 86144  10m 9948 D 51.1  0.0 137:54.02 fio
> 22134 root      16   0 86144  10m 9948 R 51.1  0.0 138:03.73 fio
> 22068 root      16   0 86144  10m 9952 R 49.2  0.0 138:04.16 fio
> 22083 root      16   0 86144  10m 9952 D 45.2  0.0 138:09.32 fio
> 22071 root      16   0 86144  10m 9948 D 29.5  0.0 137:37.75 fio
> 22128 root      16   0 86144  10m 9952 D 25.6  0.0 137:48.11 fio
> 22092 root      16   0 86144  10m 9948 D 11.8  0.0 137:54.14 fio
> 22097 root      16   0 86144  10m 9948 D  9.8  0.0 138:07.81 fio
> ...
>
> There is nothing in the system to account for this (not occurring in
> conjunction with i.e. logrotate or cron).
>
> I've repeated this with 1.21 and 1.31, on RHEL 5.3 and RHEL 5.4.  The
> 1.31/RHEL5.4 case, the period was ~15 hours... this was also a slower
> machine.
>
> Any idea what's going on?
>
> Thanks,
>
> Chris
>
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html