RE: fio 3.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, now I fixed that.  Seems to be sequence of the keys is important.  I fixed that.  Now I have next issue :-)

I trying to speed up customer’s script.

Initially it was with /dev/pmem & libaio engine,

dl560g10spmem01:~/FIO-PMEM_TestScripts/DAX # /usr/bin/fio --filename=/dev/pmem1 --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bssplit=4k/4:8k/7:16k/7:32k/15:64k/65:128k/1:256k/1 --rwmixread=5 --iodepth=1 --numjobs=16 --runtime=1800 --group_reporting --name=4-rand-rw-3xx 4-rand-rw-3xx: (g=0): rw=randrw, bs=4K-256K/4K-256K/4K-256K, ioengine=libaio, iodepth=1
...
fio-2.12
Starting 16 processes
Jobs: 2 (f=2): [_(5),m(1),_(7),m(1),_(2)] [100.0% done] [357.4MB/6990MB/0KB /s] [7066/137K/0 iops] [eta 00m:00s]
4-rand-rw-3xx: (groupid=0, jobs=16): err= 0: pid=23144: Mon Nov 27 19:56:04 2017
  read : io=245977MB, bw=1592.8MB/s, iops=31315, runt=154436msec
    slat (usec): min=1, max=77, avg=10.34, stdev= 6.00
    clat (usec): min=0, max=6, avg= 0.20, stdev= 0.40
     lat (usec): min=1, max=77, avg=10.57, stdev= 6.01
    clat percentiles (usec):
     |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
     | 30.00th=[    0], 40.00th=[    0], 50.00th=[    0], 60.00th=[    0],
     | 70.00th=[    0], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
     | 99.00th=[    1], 99.50th=[    1], 99.90th=[    1], 99.95th=[    1],
     | 99.99th=[    1]
    bw (KB  /s): min=38528, max=118396, per=6.36%, avg=103720.16, stdev=8779.81
  write: io=4559.9GB, bw=30234MB/s, iops=594518, runt=154436msec
    slat (usec): min=1, max=787, avg=16.46, stdev=10.00
    clat (usec): min=0, max=680, avg= 0.20, stdev= 0.41
     lat (usec): min=1, max=789, avg=16.69, stdev=10.00
    clat percentiles (usec):
     |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
     | 30.00th=[    0], 40.00th=[    0], 50.00th=[    0], 60.00th=[    0],
     | 70.00th=[    0], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
     | 99.00th=[    1], 99.50th=[    1], 99.90th=[    1], 99.95th=[    1],
     | 99.99th=[    1]
    bw (MB  /s): min=  731, max= 2015, per=6.36%, avg=1922.69, stdev=146.41
    lat (usec) : 2=100.00%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (usec) : 250=0.01%, 750=0.01%
  cpu          : usr=35.61%, sys=64.37%, ctx=2509, majf=0, minf=930
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=4836244/w=91815119/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=245977MB, aggrb=1592.8MB/s, minb=1592.8MB/s, maxb=1592.8MB/s, mint=154436msec, maxt=154436msec
  WRITE: io=4559.9GB, aggrb=30234MB/s, minb=30234MB/s, maxb=30234MB/s, mint=154436msec, maxt=154436msec

Now I want saving customer’s I/O pattern, but switch to the DAX access (/dev/dax0.0 raw device) with mmap engine, trying to bypass OS block device & system call layers.  So I modified script,

dl560g10spmem01:~/FIO-PMEM_TestScripts/DAX # /usr/bin/fio --name=4-rand-rw-3xx --ioengine=mmap --iodepth=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --bssplit=4k/4:8k/7:16k/7:32k/15:64k/65:128k/1:256k/1 --rwmixread=5 --size=290g --numjobs=16 --group_reporting --runtime=120 --filename=/dev/dax0.0
4-rand-rw-3xx: (g=0): rw=randrw, bs=4K-256K/4K-256K/4K-256K, ioengine=mmap, iodepth=1
...
fio-2.12
Starting 16 processes
Jobs: 12 (f=12): [m(3),_(2),m(1),_(1),m(6),_(1),m(2)] [78.6% done] [1304MB/24907MB/0KB /s] [25.7K/490K/0 iops] [eta 00m:33s]
4-rand-rw-3xx: (groupid=0, jobs=16): err= 0: pid=65032: Tue Nov 28 06:58:09 2017
  read : io=202831MB, bw=1690.3MB/s, iops=33231, runt=120001msec
    clat (usec): min=0, max=211, avg= 8.88, stdev= 5.23
     lat (usec): min=0, max=211, avg= 8.92, stdev= 5.23
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    1], 10.00th=[    2], 20.00th=[    4],
     | 30.00th=[    6], 40.00th=[    8], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   11], 80.00th=[   12], 90.00th=[   13], 95.00th=[   14],
     | 99.00th=[   29], 99.50th=[   43], 99.90th=[   48], 99.95th=[   49],
     | 99.99th=[   53]
    bw (KB  /s): min=86408, max=159184, per=6.36%, avg=110086.50, stdev=19409.13
  write: io=3764.3GB, bw=32121MB/s, iops=631607, runt=120001msec
    clat (usec): min=0, max=744, avg=15.16, stdev=10.09
     lat (usec): min=0, max=744, avg=15.21, stdev=10.09
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    2], 10.00th=[    3], 20.00th=[    6],
     | 30.00th=[   11], 40.00th=[   12], 50.00th=[   12], 60.00th=[   20],
     | 70.00th=[   21], 80.00th=[   22], 90.00th=[   23], 95.00th=[   24],
     | 99.00th=[   46], 99.50th=[   86], 99.90th=[   91], 99.95th=[   92],
     | 99.99th=[   96]
    bw (MB  /s): min= 1760, max= 2737, per=6.36%, avg=2043.01, stdev=353.05
    lat (usec) : 2=4.63%, 4=8.45%, 10=11.27%, 20=32.41%, 50=42.58%
    lat (usec) : 100=0.66%, 250=0.01%, 500=0.01%, 750=0.01%
  cpu          : usr=99.74%, sys=0.24%, ctx=2791, majf=0, minf=2822710
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=3987782/w=75793478/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=202831MB, aggrb=1690.3MB/s, minb=1690.3MB/s, maxb=1690.3MB/s, mint=120001msec, maxt=120001msec
  WRITE: io=3764.3GB, aggrb=32121MB/s, minb=32121MB/s, maxb=32121MB/s, mint=120001msec, maxt=120001msec

No huge growth detected – lat 16.69 vs 15.21, write bw – 30234Mb/s vs 32121Mb/s.

Looking to the performance traces,

PID 65043  /usr/bin/fio

    HARDCLOCK entries
       Count     Pct  State  Function
        2210  66.29%  USER   __memcpy_avx_unaligned [/lib64/libc-2.22.so]
        1122  33.65%  USER   UNKNOWN
           2   0.06%  SYS    pagerange_is_ram_callback

       Count     Pct  HARDCLOCK Stack trace
       ============================================================
           2   0.06%  pagerange_is_ram_callback  walk_system_ram_range  pat_pagerange_is_ram  lookup_memtype  track_pfn_insert  vmf_insert_pfn_pmd  dax_dev_pmd_fault  handle_mm_fault  __do_page_fault  do_page_fault  page_fault  unknown  |  __memcpy_avx_unaligned

   25.620925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.644924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.656923 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.680925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.692924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.704925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.716925 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.728923 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.764924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6
   25.776924 cpu=51 pid=65043 tgid=65043 hardclock state=USER  [libc-2.22.so]:__memcpy_avx_unaligned+0x2b6

Seems to be there lot's unaligned memory access, any ideas ?

You could also look at the source code for fio to see what it’s doing or put in some printf() statements so you can understand what the actual alignment is.

Anton

-----Original Message-----
From: Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx] 
Sent: Monday, November 27, 2017 9:38 PM
To: Gavriliuk, Anton (HPS Ukraine) <anton.gavriliuk@xxxxxxx>
Cc: fio@xxxxxxxxxxxxxxx; Brian Boylston <brian.boylston@xxxxxxx>; Elliott, Robert (Persistent Memory) <elliott@xxxxxxx>
Subject: Re: fio 3.2

Unfortunately I don't have access to a pmem device but let's see how far we get:

On 27 November 2017 at 12:39, Gavriliuk, Anton (HPS Ukraine) <anton.gavriliuk@xxxxxxx> wrote:
>
> result=$(fio --name=random-writers --ioengine=mmap --iodepth=32 
> --rw=randwrite --bs=64k --size=1024m --numjobs=8 --group_reporting=1 
> --eta=never --time_based --runtime=60 --filename=/dev/dax0.0 | grep 
> WRITE)

Please make your problem scenarios as simple as possible:
1. Just run fio normally so we can see the output it produces on both stdout and stderr normally.
2. Reduce the job back so it there are only the bare minimum options that reproduce the problem.
3. Try to avoid changing lots of things.

Here you've switched ioengine introducing another place to look.
Instead how about this:

Was your /dev/dax0.0 device made using ndctl? Assuming yes:

fio --name=dax-mmap --ioengine=mmap --rw=write --bs=64k --eta=never --time_based --runtime=60 --filename=/dev/dax0.0 --size=2g

(apparently a size has to be specified when you try to use a character device - see https://nvdimm.wiki.kernel.org/ )

If you run just that by itself what do you see?

Next up:

fio --name=dax-dax --ioengine=dev-dax --rw=write --bs=64k --eta=never --time_based --runtime=60 --filename=/dev/dax0.0 --size=2g

If you run just that by itself what do you see?

Finally:
Assuming a -o dax mounted filesystem on /pmem0/ :

fio --name=libpmemblk --ioengine=pmemblk --rw=write --bs=64k --eta=never --time_based --runtime=60
--filename=/pmem0/fio-test,4096,1024 --thread=1

If you run just that by itself what do you see?

(Perhaps the documentation for these ioengines and pmem devices needs to be improved?)

--
Sitsofe | http://sucs.org/~sits/
��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�

[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux