Re: All SSD Pool - Odd Performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It would have been more interesting if you had tweaked only one option as now we can’t be sure which changed had what impact… :-)

On 22 Nov 2015, at 04:29, Udo Lembke <ulembke@xxxxxxxxxxxx> wrote:

Hi Sean,
Haomai is right, that qemu can have a huge performance differences.

I have done two test to the same ceph-cluster (different pools, but this should not do any differences).
One test with proxmox ve 4 (qemu 2.4, iothread for device, and cache=writeback) gives 14856 iops
Same test with proxmox ve 3.4 (qemu 2.2.1, cache=writethrough) gives 5070 iops only.

Here the results in long:
############### proxmox ve 3.x ###############
kvm --version
QEMU emulator version 2.2.1, Copyright (c) 2003-2008 Fabrice Bellard

VM:
virtio2: ceph_file:vm-405-disk-1,cache=writethrough,backup=no,size=4096G

root@fileserver:/daten/support/test# fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting
fio: time_based requires a runtime/timeout setting
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
...
fio-2.1.11
Starting 4 processes
benchmark: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [_(1),w(1),_(2)] [100.0% done] [0KB/40024KB/0KB /s] [0/10.6K/0 iops] [eta 00m:00s]
benchmark: (groupid=0, jobs=4): err= 0: pid=7821: Sun Nov 22 04:07:47 2015
  write: io=16384MB, bw=20282KB/s, iops=5070, runt=827178msec
    slat (usec): min=0, max=2531.7K, avg=778.68, stdev=12757.26
    clat (usec): min=508, max=2755.2K, avg=99980.14, stdev=146967.17
     lat (msec): min=1, max=2755, avg=100.76, stdev=147.54
    clat percentiles (msec):
     |  1.00th=[   10],  5.00th=[   14], 10.00th=[   19], 20.00th=[   28],
     | 30.00th=[   36], 40.00th=[   43], 50.00th=[   51], 60.00th=[   63],
     | 70.00th=[   81], 80.00th=[  128], 90.00th=[  237], 95.00th=[  367],
     | 99.00th=[  717], 99.50th=[  889], 99.90th=[ 1516], 99.95th=[ 1713],
     | 99.99th=[ 2573]
    bw (KB  /s): min=    4, max=30726, per=26.90%, avg=5456.84, stdev=3014.45
    lat (usec) : 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=1.11%, 20=10.18%, 50=37.74%
    lat (msec) : 100=26.45%, 250=15.22%, 500=6.66%, 750=1.74%, 1000=0.55%
    lat (msec) : 2000=0.29%, >=2000=0.03%
  cpu          : usr=0.36%, sys=2.31%, ctx=1148702, majf=0, minf=30
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: io=16384MB, aggrb=20282KB/s, minb=20282KB/s, maxb=20282KB/s, mint=827178msec, maxt=827178msec

Disk stats (read/write):
    dm-0: ios=0/4483641, merge=0/0, ticks=0/104928824, in_queue=105927128, util=100.00%, aggrios=1/4469640, aggrmerge=0/14788, aggrticks=64/103711096, aggrin_queue=104165356, aggrutil=100.00%
  vda: ios=1/4469640, merge=0/14788, ticks=64/103711096, in_queue=104165356, util=100.00%

##############################################

############### proxmox ve 4.x ###############
kvm --version
QEMU emulator version 2.4.0.1 pve-qemu-kvm_2.4-12, Copyright (c) 2003-2008 Fabrice Bellard

grep ceph /etc/pve/qemu-server/102.conf
virtio1: ceph_test:vm-102-disk-1,cache=writeback,iothread=on,size=100G

root@fileserver-test:/daten/tv01/test# fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting          
fio: time_based requires a runtime/timeout setting                                                                                      
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128                                                             
...                                                                                                                                                
fio-2.1.11
Starting 4 processes
Jobs: 4 (f=4): [w(4)] [99.6% done] [0KB/56148KB/0KB /s] [0/14.4K/0 iops] [eta 00m:01s]
benchmark: (groupid=0, jobs=4): err= 0: pid=26131: Sun Nov 22 03:51:04 2015
  write: io=0B, bw=59425KB/s, iops=14856, runt=282327msec
    slat (usec): min=6, max=216925, avg=261.78, stdev=1802.78
    clat (msec): min=1, max=330, avg=34.04, stdev=27.78
     lat (msec): min=1, max=330, avg=34.30, stdev=27.87
    clat percentiles (msec):
     |  1.00th=[   10],  5.00th=[   13], 10.00th=[   14], 20.00th=[   16],
     | 30.00th=[   18], 40.00th=[   19], 50.00th=[   21], 60.00th=[   24],
     | 70.00th=[   33], 80.00th=[   62], 90.00th=[   81], 95.00th=[   87],
     | 99.00th=[   95], 99.50th=[  100], 99.90th=[  269], 99.95th=[  277],
     | 99.99th=[  297]
    bw (KB  /s): min=    3, max=42216, per=25.10%, avg=14917.03, stdev=2990.50
    lat (msec) : 2=0.01%, 4=0.01%, 10=1.13%, 20=45.52%, 50=28.23%
    lat (msec) : 100=24.61%, 250=0.35%, 500=0.16%
  cpu          : usr=2.20%, sys=14.42%, ctx=2462199, majf=0, minf=40
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: io=16384MB, aggrb=59424KB/s, minb=59424KB/s, maxb=59424KB/s, mint=282327msec, maxt=282327msec

Disk stats (read/write):
    dm-0: ios=0/4192044, merge=0/0, ticks=0/35093432, in_queue=35116888, util=99.70%, aggrios=0/4194626, aggrmerge=0/14, aggrticks=0/34902692, aggrin_queue=34903976, aggrutil=99.65%
  vda: ios=0/4194626, merge=0/14, ticks=0/34902692, in_queue=34903976, util=99.65%
##############################################

regards

Udo

On 19.11.2015 11:46, Sean Redmond wrote:
Hi Mike/Warren,

Thanks for helping out here. I am running the below fio command to test this with 4 jobs and a iodepth of 128

fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reportin

The QEMU instance is created using nova, the settings I can see in the config are below:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback'/>
      <auth username='$$'>
        <secret type='ceph' uuid='$$'/>
      </auth>
      <source protocol='rbd' name='ssd_volume/volume-$$'>
        <host name='$$' port='6789'/>
        <host name='$$' port='6789'/>
        <host name='$$' port='6789'/>
      </source>
      <target dev='vde' bus='virtio'/>
      <serial>$$</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </disk>


The below shows the output from running Fio:

# fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting
fio: time_based requires a runtime/timeout setting
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
...
benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
fio-2.0.13
Starting 4 processes
Jobs: 3 (f=3): [_www] [99.7% done] [0K/36351K/0K /s] [0 /9087 /0  iops] [eta 00m:03s]
benchmark: (groupid=0, jobs=4): err= 0: pid=8547: Thu Nov 19 05:16:31 2015
  write: io=16384MB, bw=19103KB/s, iops=4775 , runt=878269msec
    slat (usec): min=4 , max=2339.4K, avg=807.17, stdev=12460.02
    clat (usec): min=1 , max=2469.6K, avg=106265.05, stdev=138893.39
     lat (usec): min=67 , max=2469.8K, avg=107073.04, stdev=139377.68
    clat percentiles (usec):
     |  1.00th=[ 1928],  5.00th=[ 9408], 10.00th=[12352], 20.00th=[18816],
     | 30.00th=[43776], 40.00th=[64768], 50.00th=[78336], 60.00th=[89600],
     | 70.00th=[102912], 80.00th=[123392], 90.00th=[216064], 95.00th=[370688],
     | 99.00th=[733184], 99.50th=[782336], 99.90th=[1044480], 99.95th=[2088960],
     | 99.99th=[2342912]
    bw (KB/s)  : min=    4, max=14968, per=26.11%, avg=4987.39, stdev=1947.67
    lat (usec) : 2=0.01%, 20=0.01%, 50=0.01%, 100=0.05%, 250=0.30%
    lat (usec) : 500=0.24%, 750=0.11%, 1000=0.08%
    lat (msec) : 2=0.23%, 4=0.46%, 10=4.47%, 20=15.08%, 50=11.28%
    lat (msec) : 100=35.47%, 250=23.52%, 500=5.92%, 750=1.96%, 1000=0.70%
    lat (msec) : 2000=0.06%, >=2000=0.06%
  cpu          : usr=0.62%, sys=2.42%, ctx=1602209, majf=1, minf=101
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=16384MB, aggrb=19102KB/s, minb=19102KB/s, maxb=19102KB/s, mint=878269msec, maxt=878269msec

Disk stats (read/write):
  vde: ios=1119/4330437, merge=0/105599, ticks=556/121755054, in_queue=121749666, util=99.86

The below shows lspci from within the guest:

# lspci | grep -i scsi
00:04.0 SCSI storage controller: Red Hat, Inc Virtio block devic

Thanks

On Wed, Nov 18, 2015 at 7:05 PM, Warren Wang - ISD <Warren.Wang@xxxxxxxxxxx> wrote:
What were you using for iodepth and numjobs? If you’re getting an average of 2ms per operation, and you’re single threaded, I’d expect about 500 IOPS / thread, until you hit the limit of your QEMU setup, which may be a single IO thread. That’s also what I think Mike is alluding to.

Warren

From: Sean Redmond <sean.redmond1@xxxxxxxxx<mailto:sean.redmond1@xxxxxxxxx>>
Date: Wednesday, November 18, 2015 at 6:39 AM
To: "ceph-users@xxxxxxxx<mailto:ceph-users@xxxxxxxx>" <ceph-users@xxxxxxxx<mailto:ceph-users@xxxxxxxx>>
Subject: All SSD Pool - Odd Performance

Hi,

I have a performance question for anyone running an SSD only pool. Let me detail the setup first.

12 X Dell PowerEdge R630 ( 2 X 2620v3 64Gb RAM)
8 X intel DC 3710 800GB
Dual port Solarflare 10GB/s NIC (one front and one back)
Ceph 0.94.5
Ubuntu 14.04 (3.13.0-68-generic)

The above is in one pool that is used for QEMU guests, A 4k FIO test on the SSD directly yields around 55k Iops, the same test inside a QEMU guest seems to hit a limit around 4k Iops. If I deploy multiple guests they can all reach 4K Iops simultaneously.

I don't see any evidence of a bottle neck on the OSD hosts,Is this limit inside the guest expected or I am just not looking deep enough yet?

Thanks

This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux