Re: fio 3.2

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Wed, 29 Nov 2017 00:04:31 +0000

Anton,

Did you see Elliot's suggestions (i.e. point 4) in his previous email
(http://www.spinics.net/lists/fio/msg06561.html ) for speeding things
up and how you are at the mercy of glibc?

On 28 November 2017 at 14:25, Gavriliuk, Anton (HPS Ukraine)
<anton.gavriliuk@xxxxxxx> wrote:
> Ok, now I fixed that.  Seems to be sequence of the keys is important.  I fixed that.  Now I have next issue :-)
>
> I trying to speed up customer’s script.
[snip]
>      lat (usec): min=1, max=77, avg=10.57, stdev= 6.01
>     clat percentiles (usec):
>      |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
>      | 30.00th=[    0], 40.00th=[    0], 50.00th=[    0], 60.00th=[    0],
>      | 70.00th=[    0], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
>      | 99.00th=[    1], 99.50th=[    1], 99.90th=[    1], 99.95th=[    1],
>      | 99.99th=[    1]
>     bw (KB  /s): min=38528, max=118396, per=6.36%, avg=103720.16, stdev=8779.81
>   write: io=4559.9GB, bw=30234MB/s, iops=594518, runt=154436msec
>     slat (usec): min=1, max=787, avg=16.46, stdev=10.00
>     clat (usec): min=0, max=680, avg= 0.20, stdev= 0.41
>      lat (usec): min=1, max=789, avg=16.69, stdev=10.00
>     clat percentiles (usec):
>      |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
>      | 30.00th=[    0], 40.00th=[    0], 50.00th=[    0], 60.00th=[    0],
>      | 70.00th=[    0], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
>      | 99.00th=[    1], 99.50th=[    1], 99.90th=[    1], 99.95th=[    1],
>      | 99.99th=[    1]
>     bw (MB  /s): min=  731, max= 2015, per=6.36%, avg=1922.69, stdev=146.41
>     lat (usec) : 2=100.00%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
>     lat (usec) : 250=0.01%, 750=0.01%
>   cpu          : usr=35.61%, sys=64.37%, ctx=2509, majf=0, minf=930
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=4836244/w=91815119/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    READ: io=245977MB, aggrb=1592.8MB/s, minb=1592.8MB/s, maxb=1592.8MB/s, mint=154436msec, maxt=154436msec
>   WRITE: io=4559.9GB, aggrb=30234MB/s, minb=30234MB/s, maxb=30234MB/s, mint=154436msec, maxt=154436msec
>
> Now I want saving customer’s I/O pattern, but switch to the DAX access (/dev/dax0.0 raw device) with mmap engine, trying to bypass OS block device & system call layers.  So I modified script,
>
> dl560g10spmem01:~/FIO-PMEM_TestScripts/DAX # /usr/bin/fio --name=4-rand-rw-3xx --ioengine=mmap --iodepth=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --bssplit=4k/4:8k/7:16k/7:32k/15:64k/65:128k/1:256k/1 --rwmixread=5 --size=290g --numjobs=16 --group_reporting --runtime=120 --filename=/dev/dax0.0
> 4-rand-rw-3xx: (g=0): rw=randrw, bs=4K-256K/4K-256K/4K-256K, ioengine=mmap, iodepth=1
> ...
> fio-2.12
> Starting 16 processes
> Jobs: 12 (f=12): [m(3),_(2),m(1),_(1),m(6),_(1),m(2)] [78.6% done] [1304MB/24907MB/0KB /s] [25.7K/490K/0 iops] [eta 00m:33s]
> 4-rand-rw-3xx: (groupid=0, jobs=16): err= 0: pid=65032: Tue Nov 28 06:58:09 2017
>   read : io=202831MB, bw=1690.3MB/s, iops=33231, runt=120001msec
>     clat (usec): min=0, max=211, avg= 8.88, stdev= 5.23
>      lat (usec): min=0, max=211, avg= 8.92, stdev= 5.23
>     clat percentiles (usec):
>      |  1.00th=[    1],  5.00th=[    1], 10.00th=[    2], 20.00th=[    4],
>      | 30.00th=[    6], 40.00th=[    8], 50.00th=[   10], 60.00th=[   11],
>      | 70.00th=[   11], 80.00th=[   12], 90.00th=[   13], 95.00th=[   14],
>      | 99.00th=[   29], 99.50th=[   43], 99.90th=[   48], 99.95th=[   49],
>      | 99.99th=[   53]
>     bw (KB  /s): min=86408, max=159184, per=6.36%, avg=110086.50, stdev=19409.13
>   write: io=3764.3GB, bw=32121MB/s, iops=631607, runt=120001msec
>     clat (usec): min=0, max=744, avg=15.16, stdev=10.09
>      lat (usec): min=0, max=744, avg=15.21, stdev=10.09
>     clat percentiles (usec):
>      |  1.00th=[    1],  5.00th=[    2], 10.00th=[    3], 20.00th=[    6],
>      | 30.00th=[   11], 40.00th=[   12], 50.00th=[   12], 60.00th=[   20],
>      | 70.00th=[   21], 80.00th=[   22], 90.00th=[   23], 95.00th=[   24],
>      | 99.00th=[   46], 99.50th=[   86], 99.90th=[   91], 99.95th=[   92],
>      | 99.99th=[   96]
>     bw (MB  /s): min= 1760, max= 2737, per=6.36%, avg=2043.01, stdev=353.05
>     lat (usec) : 2=4.63%, 4=8.45%, 10=11.27%, 20=32.41%, 50=42.58%
>     lat (usec) : 100=0.66%, 250=0.01%, 500=0.01%, 750=0.01%
>   cpu          : usr=99.74%, sys=0.24%, ctx=2791, majf=0, minf=2822710
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=3987782/w=75793478/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    READ: io=202831MB, aggrb=1690.3MB/s, minb=1690.3MB/s, maxb=1690.3MB/s, mint=120001msec, maxt=120001msec
>   WRITE: io=3764.3GB, aggrb=32121MB/s, minb=32121MB/s, maxb=32121MB/s, mint=120001msec, maxt=120001msec
>
> No huge growth detected – lat 16.69 vs 15.21, write bw – 30234Mb/s vs 32121Mb/s.
>
> Looking to the performance traces,
>
> PID 65043  /usr/bin/fio
>
>     HARDCLOCK entries
>        Count     Pct  State  Function
>         2210  66.29%  USER   __memcpy_avx_unaligned [/lib64/libc-2.22.so]
>         1122  33.65%  USER   UNKNOWN
>            2   0.06%  SYS    pagerange_is_ram_callback
>
>        Count     Pct  HARDCLOCK Stack trace
>        ============================================================
>            2   0.06%  pagerange_is_ram_callback  walk_system_ram_range  pat_pagerange_is_ram  lookup_memtype  track_pfn_insert  vmf_insert_pfn_pmd  dax_dev_pmd_fault  handle_mm_fault  __do_page_fault  do_page_fault  page_fault  unknown  |  __memcpy_avx_unaligned
>
>    25.620925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.644924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.656923 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.680925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.692924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.704925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.716925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.728923 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.764924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>    25.776924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
>
> Seems to be there lot's unaligned memory access, any ideas ?
>
> You could also look at the source code for fio to see what it’s doing or put in some printf() statements so you can understand what the actual alignment is.
>
> Anton

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html