Re: All SSD Pool - Odd Performance

Haomai Wang <haomaiwang@xxxxxxxxx> · Sun, 22 Nov 2015 01:47:16 +0800

I guess we have a lot of qemu performance problem related mails in the
ML. You may get insight from their discusses.

You may expect to run rbd bench-write to see how many iops you can get
outside vm

On Thu, Nov 19, 2015 at 6:46 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:
> Hi Mike/Warren,
>
> Thanks for helping out here. I am running the below fio command to test this
> with 4 jobs and a iodepth of 128
>
> fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin
> --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1
> --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k
> --group_reportin
>
> The QEMU instance is created using nova, the settings I can see in the
> config are below:
>
>     <disk type='network' device='disk'>
>       <driver name='qemu' type='raw' cache='writeback'/>
>       <auth username='$$'>
>         <secret type='ceph' uuid='$$'/>
>       </auth>
>       <source protocol='rbd' name='ssd_volume/volume-$$'>
>         <host name='$$' port='6789'/>
>         <host name='$$' port='6789'/>
>         <host name='$$' port='6789'/>
>       </source>
>       <target dev='vde' bus='virtio'/>
>       <serial>$$</serial>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x09'
> function='0x0'/>
>     </disk>
>
>
> The below shows the output from running Fio:
>
> # fio --time_based --name=benchmark --size=4G --filename=/mnt/test.bin
> --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1
> --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k
> --group_reporting
> fio: time_based requires a runtime/timeout setting
> benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
> iodepth=128
> ...
> benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
> iodepth=128
> fio-2.0.13
> Starting 4 processes
> Jobs: 3 (f=3): [_www] [99.7% done] [0K/36351K/0K /s] [0 /9087 /0  iops] [eta
> 00m:03s]
> benchmark: (groupid=0, jobs=4): err= 0: pid=8547: Thu Nov 19 05:16:31 2015
>   write: io=16384MB, bw=19103KB/s, iops=4775 , runt=878269msec
>     slat (usec): min=4 , max=2339.4K, avg=807.17, stdev=12460.02
>     clat (usec): min=1 , max=2469.6K, avg=106265.05, stdev=138893.39
>      lat (usec): min=67 , max=2469.8K, avg=107073.04, stdev=139377.68
>     clat percentiles (usec):
>      |  1.00th=[ 1928],  5.00th=[ 9408], 10.00th=[12352], 20.00th=[18816],
>      | 30.00th=[43776], 40.00th=[64768], 50.00th=[78336], 60.00th=[89600],
>      | 70.00th=[102912], 80.00th=[123392], 90.00th=[216064],
> 95.00th=[370688],
>      | 99.00th=[733184], 99.50th=[782336], 99.90th=[1044480],
> 99.95th=[2088960],
>      | 99.99th=[2342912]
>     bw (KB/s)  : min=    4, max=14968, per=26.11%, avg=4987.39,
> stdev=1947.67
>     lat (usec) : 2=0.01%, 20=0.01%, 50=0.01%, 100=0.05%, 250=0.30%
>     lat (usec) : 500=0.24%, 750=0.11%, 1000=0.08%
>     lat (msec) : 2=0.23%, 4=0.46%, 10=4.47%, 20=15.08%, 50=11.28%
>     lat (msec) : 100=35.47%, 250=23.52%, 500=5.92%, 750=1.96%, 1000=0.70%
>     lat (msec) : 2000=0.06%, >=2000=0.06%
>   cpu          : usr=0.62%, sys=2.42%, ctx=1602209, majf=1, minf=101
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>>=64=100.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>=64=0.1%
>      issued    : total=r=0/w=4194304/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
>   WRITE: io=16384MB, aggrb=19102KB/s, minb=19102KB/s, maxb=19102KB/s,
> mint=878269msec, maxt=878269msec
>
> Disk stats (read/write):
>   vde: ios=1119/4330437, merge=0/105599, ticks=556/121755054,
> in_queue=121749666, util=99.86
>
> The below shows lspci from within the guest:
>
> # lspci | grep -i scsi
> 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block devic
>
> Thanks
>
> On Wed, Nov 18, 2015 at 7:05 PM, Warren Wang - ISD <Warren.Wang@xxxxxxxxxxx>
> wrote:
>>
>> What were you using for iodepth and numjobs? If you’re getting an average
>> of 2ms per operation, and you’re single threaded, I’d expect about 500 IOPS
>> / thread, until you hit the limit of your QEMU setup, which may be a single
>> IO thread. That’s also what I think Mike is alluding to.
>>
>> Warren
>>
>> From: Sean Redmond
>> <sean.redmond1@xxxxxxxxx<mailto:sean.redmond1@xxxxxxxxx>>
>> Date: Wednesday, November 18, 2015 at 6:39 AM
>> To: "ceph-users@xxxxxxxx<mailto:ceph-users@xxxxxxxx>"
>> <ceph-users@xxxxxxxx<mailto:ceph-users@xxxxxxxx>>
>> Subject:  All SSD Pool - Odd Performance
>>
>> Hi,
>>
>> I have a performance question for anyone running an SSD only pool. Let me
>> detail the setup first.
>>
>> 12 X Dell PowerEdge R630 ( 2 X 2620v3 64Gb RAM)
>> 8 X intel DC 3710 800GB
>> Dual port Solarflare 10GB/s NIC (one front and one back)
>> Ceph 0.94.5
>> Ubuntu 14.04 (3.13.0-68-generic)
>>
>> The above is in one pool that is used for QEMU guests, A 4k FIO test on
>> the SSD directly yields around 55k Iops, the same test inside a QEMU guest
>> seems to hit a limit around 4k Iops. If I deploy multiple guests they can
>> all reach 4K Iops simultaneously.
>>
>> I don't see any evidence of a bottle neck on the OSD hosts,Is this limit
>> inside the guest expected or I am just not looking deep enough yet?
>>
>> Thanks
>>
>> This email and any files transmitted with it are confidential and intended
>> solely for the individual or entity to whom they are addressed. If you have
>> received this email in error destroy it immediately. *** Walmart
>> Confidential ***
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com