Re: benchmark Ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can you post the fio results with the ioengine using libaio? From what you posted, it seems to me that the read test hit cache. And the write performance was not good, the latency was too high (~35.4ms) while the numjobs and iodepth both were 1. Did you monitor system stat on both side (VM/Compute Node and Cluster)?




------------------ Original ------------------
From: &nbsp;"Tony Liu";<tonyliu0592@xxxxxxxxxxx&gt;;
Date: &nbsp;Sep 15, 2020
To: &nbsp;"ceph-users"<ceph-users@xxxxxxx&gt;; 

Subject: &nbsp; benchmark Ceph



Hi,

I have a 3-OSD-node Ceph cluster with 1 480GB SSD and 8 x 2TB
12Gpbs SAS HDD on each node, to provide storage to a OpenStack
cluster. Both public and cluster networks are 2x10G. WAL and DB
of each OSD is on SSD and they share the same 60GB partition.

I run fio with different combinations of operation, block size and
io-depth to collect IOPS, bandwidth and latency. I tried fio on
compute node with ioengine=rbd, also fio within VM (backed by Ceph)
with ioengine=libaio.

The result doesn't seem good. Here are couple examples.
====================================
fio --name=test --ioengine=rbd --clientname=admin \
&nbsp;&nbsp;&nbsp; --pool=benchmark --rbdname=test --numjobs=1 \
&nbsp;&nbsp;&nbsp; --runtime=30 --direct=1 --size=2G \
&nbsp;&nbsp;&nbsp; --rw=read --bs=4k --iodepth=1

test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=0): [f(1)][100.0%][r=27.6MiB/s,w=0KiB/s][r=7075,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=56310: Mon Sep 14 19:01:24 2020
&nbsp;&nbsp; read: IOPS=7610, BW=29.7MiB/s (31.2MB/s)(892MiB/30001msec)
&nbsp;&nbsp;&nbsp; slat (nsec): min=1550, max=57662, avg=3312.74, stdev=2981.42
&nbsp;&nbsp;&nbsp; clat (usec): min=77, max=4799, avg=127.39, stdev=39.88
&nbsp;&nbsp;&nbsp;&nbsp; lat (usec): min=78, max=4812, avg=130.70, stdev=40.67
&nbsp;&nbsp;&nbsp; clat percentiles (usec):
&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp; 1.00th=[&nbsp;&nbsp; 82],&nbsp; 5.00th=[&nbsp;&nbsp; 86], 10.00th=[&nbsp;&nbsp; 95], 20.00th=[&nbsp;&nbsp; 98],
&nbsp;&nbsp;&nbsp;&nbsp; | 30.00th=[&nbsp; 100], 40.00th=[&nbsp; 104], 50.00th=[&nbsp; 116], 60.00th=[&nbsp; 129],
&nbsp;&nbsp;&nbsp;&nbsp; | 70.00th=[&nbsp; 141], 80.00th=[&nbsp; 157], 90.00th=[&nbsp; 182], 95.00th=[&nbsp; 198],
&nbsp;&nbsp;&nbsp;&nbsp; | 99.00th=[&nbsp; 233], 99.50th=[&nbsp; 245], 99.90th=[&nbsp; 359], 99.95th=[&nbsp; 515],
&nbsp;&nbsp;&nbsp;&nbsp; | 99.99th=[&nbsp; 709]
&nbsp;&nbsp; bw (&nbsp; KiB/s): min=27160, max=40696, per=100.00%, avg=30474.29, stdev=2826.23, samples=59
&nbsp;&nbsp; iops&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : min= 6790, max=10174, avg=7618.56, stdev=706.56, samples=59
&nbsp; lat (usec)&nbsp;&nbsp; : 100=28.89%, 250=70.72%, 500=0.34%, 750=0.05%, 1000=0.01%
&nbsp; lat (msec)&nbsp;&nbsp; : 2=0.01%, 10=0.01%
&nbsp; cpu&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : usr=3.55%, sys=3.80%, ctx=228358, majf=0, minf=29
&nbsp; IO depths&nbsp;&nbsp;&nbsp; : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, &gt;=64=0.0%
&nbsp;&nbsp;&nbsp;&nbsp; submit&nbsp;&nbsp;&nbsp; : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
&nbsp;&nbsp;&nbsp;&nbsp; complete&nbsp; : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
&nbsp;&nbsp;&nbsp;&nbsp; issued rwts: total=228333,0,0,0 short=0,0,0,0 dropped=0,0,0,0
&nbsp;&nbsp;&nbsp;&nbsp; latency&nbsp;&nbsp; : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
&nbsp;&nbsp; READ: bw=29.7MiB/s (31.2MB/s), 29.7MiB/s-29.7MiB/s (31.2MB/s-31.2MB/s), io=892MiB (935MB), run=30001-30001msec

Disk stats (read/write):
&nbsp;&nbsp;&nbsp; dm-0: ios=290/3, merge=0/0, ticks=2427/19, in_queue=2446, util=0.95%, aggrios=290/4, aggrmerge=0/0, aggrticks=2427/39, aggrin_queue=2332, aggrutil=0.95%
&nbsp; sda: ios=290/4, merge=0/0, ticks=2427/39, in_queue=2332, util=0.95%
====================================
====================================
fio --name=test --ioengine=rbd --clientname=admin \
&nbsp;&nbsp;&nbsp; --pool=benchmark --rbdname=test --numjobs=1 \
&nbsp;&nbsp;&nbsp; --runtime=30 --direct=1 --size=2G \
&nbsp;&nbsp;&nbsp; --rw=write --bs=4k --iodepth=1

test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=6352KiB/s][r=0,w=1588 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=56544: Mon Sep 14 19:03:36 2020
&nbsp; write: IOPS=1604, BW=6417KiB/s (6571kB/s)(188MiB/30003msec)
&nbsp;&nbsp;&nbsp; slat (nsec): min=2240, max=45925, avg=6526.95, stdev=3486.19
&nbsp;&nbsp;&nbsp; clat (usec): min=399, max=35411, avg=615.88, stdev=231.41
&nbsp;&nbsp;&nbsp;&nbsp; lat (usec): min=402, max=35421, avg=622.40, stdev=232.08
&nbsp;&nbsp;&nbsp; clat percentiles (usec):
&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp; 1.00th=[&nbsp; 420],&nbsp; 5.00th=[&nbsp; 449], 10.00th=[&nbsp; 469], 20.00th=[&nbsp; 498],
&nbsp;&nbsp;&nbsp;&nbsp; | 30.00th=[&nbsp; 529], 40.00th=[&nbsp; 562], 50.00th=[&nbsp; 611], 60.00th=[&nbsp; 652],
&nbsp;&nbsp;&nbsp;&nbsp; | 70.00th=[&nbsp; 685], 80.00th=[&nbsp; 709], 90.00th=[&nbsp; 766], 95.00th=[&nbsp; 799],
&nbsp;&nbsp;&nbsp;&nbsp; | 99.00th=[&nbsp; 881], 99.50th=[&nbsp; 955], 99.90th=[ 2671], 99.95th=[ 3097],
&nbsp;&nbsp;&nbsp;&nbsp; | 99.99th=[ 3785]
&nbsp;&nbsp; bw (&nbsp; KiB/s): min= 5944, max= 6792, per=100.00%, avg=6415.95, stdev=178.72, samples=60
&nbsp;&nbsp; iops&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : min= 1486, max= 1698, avg=1603.93, stdev=44.67, samples=60
&nbsp; lat (usec)&nbsp;&nbsp; : 500=20.82%, 750=67.23%, 1000=11.55%
&nbsp; lat (msec)&nbsp;&nbsp; : 2=0.25%, 4=0.14%, 10=0.01%, 20=0.01%, 50=0.01%
&nbsp; cpu&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : usr=1.22%, sys=1.25%, ctx=48143, majf=0, minf=18
&nbsp; IO depths&nbsp;&nbsp;&nbsp; : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, &gt;=64=0.0%
&nbsp;&nbsp;&nbsp;&nbsp; submit&nbsp;&nbsp;&nbsp; : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
&nbsp;&nbsp;&nbsp;&nbsp; complete&nbsp; : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
&nbsp;&nbsp;&nbsp;&nbsp; issued rwts: total=0,48129,0,0 short=0,0,0,0 dropped=0,0,0,0
&nbsp;&nbsp;&nbsp;&nbsp; latency&nbsp;&nbsp; : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
&nbsp; WRITE: bw=6417KiB/s (6571kB/s), 6417KiB/s-6417KiB/s (6571kB/s-6571kB/s), io=188MiB (197MB), run=30003-30003msec

Disk stats (read/write):
&nbsp;&nbsp;&nbsp; dm-0: ios=31/2, merge=0/0, ticks=342/14, in_queue=356, util=0.12%, aggrios=33/3, aggrmerge=0/0, aggrticks=390/27, aggrin_queue=404, aggrutil=0.13%
&nbsp; sda: ios=33/3, merge=0/0, ticks=390/27, in_queue=404, util=0.13%
====================================

Does that make sense? How do you benchmark your Ceph cluster?
Appreciate if you could share your experiences here.

Thanks!
Tony
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux