[kvm] Fio direct i/o read faster than buffered i/o

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
I am getting very strange results while testing Ceph performance using fio from KVM guest VM. My environment is as follows:

  • Ceph version 0.94.5
    • 4x Storage Nodes, 36 SATA disks each, journals on SSD. 10Gbps links
  • KVM as hypervisor (qemu-kvm 2.0.0+dfsg-2ubuntu1.22) on Ubuntu 14.04, 10 Gbps link to Ceph nodes.
  • The hypervisor is hosting VM (EL7 3.10.0-327.13.1.el7.x86_64) using rbd libvirt driver. Part of VM config shown below.
    From this VM I am trying to measure IOPS and using direct I/O I am getting around 10-13k IOPS from random read test but using buffered I/O only ~600 IOPS. This is what fio is giving me, other tools (iozone, ioping, dd) shows more reasonable results (see below).

I ran out of ideas on this. I have tried setting different clock sources (kvm-clock, tsc), setting cache=nonte in the VM’s xml config but results are the same. Could somebody explain those results to me? Maybe I am missing something but I think that buffered I/O should be faster.

Here a part of VM config:

  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <auth username='compute'>
        <secret type='ceph' uuid='key'/>
      </auth>
      <source protocol='rbd' name='compute/e46e94c7-33ec-4a37-a6f7-7f10f81b3910_disk'>
        <host name='ceph-mon1' port='6789'/>
        <host name='ceph-mon2' port='6789'/>
        <host name='ceph-mon3' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
  • fio buffered i/o
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [2725KB/0KB/0KB /s] [681/0/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=4556: Thu Oct 27 18:19:22 2016
  read : io=102400KB, bw=2218.5KB/s, iops=554, runt= 46158msec
    slat (usec): min=113, max=213672, avg=1795.79, stdev=2928.08
    clat (usec): min=5, max=679746, avg=55837.68, stdev=32366.32
     lat (msec): min=1, max=686, avg=57.63, stdev=33.11
    clat percentiles (msec):
     |  1.00th=[   41],  5.00th=[   46], 10.00th=[   47], 20.00th=[   49],
     | 30.00th=[   50], 40.00th=[   51], 50.00th=[   52], 60.00th=[   53],
     | 70.00th=[   54], 80.00th=[   56], 90.00th=[   58], 95.00th=[   61],
     | 99.00th=[  221], 99.50th=[  285], 99.90th=[  478], 99.95th=[  570],
     | 99.99th=[  660]
    bw (KB  /s): min=  190, max= 2810, per=99.80%, avg=2213.46, stdev=452.98
    lat (usec) : 10=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.03%, 50=31.26%
    lat (msec) : 100=66.54%, 250=1.37%, 500=0.70%, 750=0.08%
  cpu          : usr=0.49%, sys=1.96%, ctx=25601, majf=0, minf=64
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=25600/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=102400KB, aggrb=2218KB/s, minb=2218KB/s, maxb=2218KB/s, mint=46158msec, maxt=46158msec

Disk stats (read/write):
  vda: ios=25439/16, merge=0/4, ticks=45035/27, in_queue=45052, util=97.89%
  • fio direct i/o
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [46017KB/0KB/0KB /s] [11.6K/0/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=4587: Thu Oct 27 18:21:24 2016
  read : io=2048.0MB, bw=38256KB/s, iops=9563, runt= 54819msec
    slat (usec): min=2, max=2397, avg=11.70, stdev= 5.67
    clat (usec): min=54, max=778025, avg=3329.71, stdev=9344.22
     lat (usec): min=64, max=778041, avg=3341.82, stdev=9344.30
    clat percentiles (usec):
     |  1.00th=[ 1112],  5.00th=[ 1464], 10.00th=[ 1688], 20.00th=[ 1976],
     | 30.00th=[ 2160], 40.00th=[ 2352], 50.00th=[ 2512], 60.00th=[ 2672],
     | 70.00th=[ 2864], 80.00th=[ 3184], 90.00th=[ 3792], 95.00th=[ 4768],
     | 99.00th=[17792], 99.50th=[38656], 99.90th=[136192], 99.95th=[187392],
     | 99.99th=[366592]
    bw (KB  /s): min= 4672, max=49320, per=100.00%, avg=38354.26, stdev=11599.67
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.03%, 1000=0.41%
    lat (msec) : 2=20.66%, 4=70.57%, 10=6.58%, 20=0.84%, 50=0.52%
    lat (msec) : 100=0.22%, 250=0.14%, 500=0.02%, 750=0.01%, 1000=0.01%
  cpu          : usr=4.76%, sys=18.67%, ctx=463114, majf=0, minf=65
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=524288/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=2048.0MB, aggrb=38255KB/s, minb=38255KB/s, maxb=38255KB/s, mint=54819msec, maxt=54819msec

Disk stats (read/write):
  vda: ios=523866/15, merge=0/5, ticks=1737952/27, in_queue=1737947, util=99.88%
  • iozone direct i/o
    Record Size 4 kB
    O_DIRECT feature enabled
    File size set to 204800 kB
    Command line used: iozone -r 4k -I -s 200m
    Output is in kBytes/sec
    Time Resolution = 0.000001 seconds.
    Processor cache size set to 1024 kBytes.
    Processor cache line size set to 32 bytes.
    File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              kB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
          204800       4   34260   37305     3533     3516    2821   15448    3950    44162     4047  1322073  1943183 3651208  3790895
  • iozone buffered i/o
    Record Size 4 kB
    File size set to 204800 kB
    Command line used: iozone -r 4k -s 200m
    Output is in kBytes/sec
    Time Resolution = 0.000001 seconds.
    Processor cache size set to 1024 kBytes.
    Processor cache line size set to 32 bytes.
    File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                                   
              kB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
          204800       4 1354155 2150220  4865776  4862195 3740709 1838733 4074480  4854364  3955199  1648038  2104247 5054543  4814289
  • dd, ioping,
[root@zxc-snap ~]# dd if=/dev/vda of=/dev/null bs=4k count=1000000 iflag=direct
^C45209+0 records in
45208+0 records out
185171968 bytes (185 MB) copied, 44.4084 s, 4.2 MB/s

[root@zxc-snap ~]# dd if=/dev/vda of=/dev/null bs=4k count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 16.7044 s, 245 MB/s

[root@zxc-snap ~]# ioping /tmp/ -D
4 KiB from /tmp/ (xfs /dev/vda1): request=1 time=265 us
4 KiB from /tmp/ (xfs /dev/vda1): request=2 time=314 us
4 KiB from /tmp/ (xfs /dev/vda1): request=3 time=354 us
4 KiB from /tmp/ (xfs /dev/vda1): request=4 time=262 us
4 KiB from /tmp/ (xfs /dev/vda1): request=5 time=2.27 ms
4 KiB from /tmp/ (xfs /dev/vda1): request=6 time=324 us
4 KiB from /tmp/ (xfs /dev/vda1): request=7 time=559 us
4 KiB from /tmp/ (xfs /dev/vda1): request=8 time=319 us
4 KiB from /tmp/ (xfs /dev/vda1): request=9 time=333 us
4 KiB from /tmp/ (xfs /dev/vda1): request=10 time=238 us
4 KiB from /tmp/ (xfs /dev/vda1): request=11 time=304 us
4 KiB from /tmp/ (xfs /dev/vda1): request=12 time=346 us
4 KiB from /tmp/ (xfs /dev/vda1): request=13 time=584 us
^C
--- /tmp/ (xfs /dev/vda1) ioping statistics ---
13 requests completed in 12.5 s, 2.01 k iops, 7.84 MiB/s
min/avg/max/mdev = 238 us / 498 us / 2.27 ms / 522 us

[root@zxc-snap ~]# ioping /tmp/ -C
4 KiB from /tmp/ (xfs /dev/vda1): request=1 time=14 us
4 KiB from /tmp/ (xfs /dev/vda1): request=2 time=19 us
4 KiB from /tmp/ (xfs /dev/vda1): request=3 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=4 time=18 us
4 KiB from /tmp/ (xfs /dev/vda1): request=5 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=6 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=7 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=8 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=9 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=10 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=11 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=12 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=13 time=30 us
4 KiB from /tmp/ (xfs /dev/vda1): request=14 time=44 us
4 KiB from /tmp/ (xfs /dev/vda1): request=15 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=16 time=18 us
4 KiB from /tmp/ (xfs /dev/vda1): request=17 time=31 us
4 KiB from /tmp/ (xfs /dev/vda1): request=18 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=19 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=20 time=11 us

^C
--- /tmp/ (xfs /dev/vda1) ioping statistics ---
20 requests completed in 19.6 s, 52.2 k iops, 204.0 MiB/s
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux