Hi all,
I am getting very strange results while testing Ceph performance using fio from KVM guest VM. My environment is as follows:
- Ceph version 0.94.5
- 4x Storage Nodes, 36 SATA disks each, journals on SSD. 10Gbps links
- KVM as hypervisor (qemu-kvm 2.0.0+dfsg-2ubuntu1.22) on Ubuntu 14.04, 10 Gbps link to Ceph nodes.
- The hypervisor is hosting VM (EL7 3.10.0-327.13.1.el7.x86_64) using rbd libvirt driver. Part of VM config shown below.
From this VM I am trying to measure IOPS and using direct I/O I am getting around 10-13k IOPS from random read test but using buffered I/O only ~600 IOPS. This is what fio is giving me, other tools (iozone, ioping, dd) shows more reasonable results (see below).
I ran out of ideas on this. I have tried setting different clock sources (kvm-clock, tsc), setting cache=nonte in the VM’s xml config but results are the same. Could somebody explain those results to me? Maybe I am missing something but I think that buffered I/O should be faster.
Here a part of VM config:
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='writeback'/>
<auth username='compute'>
<secret type='ceph' uuid='key'/>
</auth>
<source protocol='rbd' name='compute/e46e94c7-33ec-4a37-a6f7-7f10f81b3910_disk'>
<host name='ceph-mon1' port='6789'/>
<host name='ceph-mon2' port='6789'/>
<host name='ceph-mon3' port='6789'/>
</source>
<backingStore/>
<target dev='vda' bus='virtio'/>
- fio buffered i/o
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [2725KB/0KB/0KB /s] [681/0/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=4556: Thu Oct 27 18:19:22 2016
read : io=102400KB, bw=2218.5KB/s, iops=554, runt= 46158msec
slat (usec): min=113, max=213672, avg=1795.79, stdev=2928.08
clat (usec): min=5, max=679746, avg=55837.68, stdev=32366.32
lat (msec): min=1, max=686, avg=57.63, stdev=33.11
clat percentiles (msec):
| 1.00th=[ 41], 5.00th=[ 46], 10.00th=[ 47], 20.00th=[ 49],
| 30.00th=[ 50], 40.00th=[ 51], 50.00th=[ 52], 60.00th=[ 53],
| 70.00th=[ 54], 80.00th=[ 56], 90.00th=[ 58], 95.00th=[ 61],
| 99.00th=[ 221], 99.50th=[ 285], 99.90th=[ 478], 99.95th=[ 570],
| 99.99th=[ 660]
bw (KB /s): min= 190, max= 2810, per=99.80%, avg=2213.46, stdev=452.98
lat (usec) : 10=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.03%, 50=31.26%
lat (msec) : 100=66.54%, 250=1.37%, 500=0.70%, 750=0.08%
cpu : usr=0.49%, sys=1.96%, ctx=25601, majf=0, minf=64
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=25600/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=2218KB/s, minb=2218KB/s, maxb=2218KB/s, mint=46158msec, maxt=46158msec
Disk stats (read/write):
vda: ios=25439/16, merge=0/4, ticks=45035/27, in_queue=45052, util=97.89%
- fio direct i/o
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [46017KB/0KB/0KB /s] [11.6K/0/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=4587: Thu Oct 27 18:21:24 2016
read : io=2048.0MB, bw=38256KB/s, iops=9563, runt= 54819msec
slat (usec): min=2, max=2397, avg=11.70, stdev= 5.67
clat (usec): min=54, max=778025, avg=3329.71, stdev=9344.22
lat (usec): min=64, max=778041, avg=3341.82, stdev=9344.30
clat percentiles (usec):
| 1.00th=[ 1112], 5.00th=[ 1464], 10.00th=[ 1688], 20.00th=[ 1976],
| 30.00th=[ 2160], 40.00th=[ 2352], 50.00th=[ 2512], 60.00th=[ 2672],
| 70.00th=[ 2864], 80.00th=[ 3184], 90.00th=[ 3792], 95.00th=[ 4768],
| 99.00th=[17792], 99.50th=[38656], 99.90th=[136192], 99.95th=[187392],
| 99.99th=[366592]
bw (KB /s): min= 4672, max=49320, per=100.00%, avg=38354.26, stdev=11599.67
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.03%, 1000=0.41%
lat (msec) : 2=20.66%, 4=70.57%, 10=6.58%, 20=0.84%, 50=0.52%
lat (msec) : 100=0.22%, 250=0.14%, 500=0.02%, 750=0.01%, 1000=0.01%
cpu : usr=4.76%, sys=18.67%, ctx=463114, majf=0, minf=65
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=524288/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=2048.0MB, aggrb=38255KB/s, minb=38255KB/s, maxb=38255KB/s, mint=54819msec, maxt=54819msec
Disk stats (read/write):
vda: ios=523866/15, merge=0/5, ticks=1737952/27, in_queue=1737947, util=99.88%
- iozone direct i/o
Record Size 4 kB
O_DIRECT feature enabled
File size set to 204800 kB
Command line used: iozone -r 4k -I -s 200m
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
204800 4 34260 37305 3533 3516 2821 15448 3950 44162 4047 1322073 1943183 3651208 3790895
- iozone buffered i/o
Record Size 4 kB
File size set to 204800 kB
Command line used: iozone -r 4k -s 200m
Output is in kBytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 kBytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
random random bkwd record stride
kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
204800 4 1354155 2150220 4865776 4862195 3740709 1838733 4074480 4854364 3955199 1648038 2104247 5054543 4814289
- dd, ioping,
[root@zxc-snap ~]# dd if=/dev/vda of=/dev/null bs=4k count=1000000 iflag=direct
^C45209+0 records in
45208+0 records out
185171968 bytes (185 MB) copied, 44.4084 s, 4.2 MB/s
[root@zxc-snap ~]# dd if=/dev/vda of=/dev/null bs=4k count=1000000
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 16.7044 s, 245 MB/s
[root@zxc-snap ~]# ioping /tmp/ -D
4 KiB from /tmp/ (xfs /dev/vda1): request=1 time=265 us
4 KiB from /tmp/ (xfs /dev/vda1): request=2 time=314 us
4 KiB from /tmp/ (xfs /dev/vda1): request=3 time=354 us
4 KiB from /tmp/ (xfs /dev/vda1): request=4 time=262 us
4 KiB from /tmp/ (xfs /dev/vda1): request=5 time=2.27 ms
4 KiB from /tmp/ (xfs /dev/vda1): request=6 time=324 us
4 KiB from /tmp/ (xfs /dev/vda1): request=7 time=559 us
4 KiB from /tmp/ (xfs /dev/vda1): request=8 time=319 us
4 KiB from /tmp/ (xfs /dev/vda1): request=9 time=333 us
4 KiB from /tmp/ (xfs /dev/vda1): request=10 time=238 us
4 KiB from /tmp/ (xfs /dev/vda1): request=11 time=304 us
4 KiB from /tmp/ (xfs /dev/vda1): request=12 time=346 us
4 KiB from /tmp/ (xfs /dev/vda1): request=13 time=584 us
^C
--- /tmp/ (xfs /dev/vda1) ioping statistics ---
13 requests completed in 12.5 s, 2.01 k iops, 7.84 MiB/s
min/avg/max/mdev = 238 us / 498 us / 2.27 ms / 522 us
[root@zxc-snap ~]# ioping /tmp/ -C
4 KiB from /tmp/ (xfs /dev/vda1): request=1 time=14 us
4 KiB from /tmp/ (xfs /dev/vda1): request=2 time=19 us
4 KiB from /tmp/ (xfs /dev/vda1): request=3 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=4 time=18 us
4 KiB from /tmp/ (xfs /dev/vda1): request=5 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=6 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=7 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=8 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=9 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=10 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=11 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=12 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=13 time=30 us
4 KiB from /tmp/ (xfs /dev/vda1): request=14 time=44 us
4 KiB from /tmp/ (xfs /dev/vda1): request=15 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=16 time=18 us
4 KiB from /tmp/ (xfs /dev/vda1): request=17 time=31 us
4 KiB from /tmp/ (xfs /dev/vda1): request=18 time=17 us
4 KiB from /tmp/ (xfs /dev/vda1): request=19 time=16 us
4 KiB from /tmp/ (xfs /dev/vda1): request=20 time=11 us
^C
--- /tmp/ (xfs /dev/vda1) ioping statistics ---
20 requests completed in 19.6 s, 52.2 k iops, 204.0 MiB/s
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com