Re: RBD poor performance

"Vitaliy Filippov" <vitalif@xxxxxxxxxx> · Thu, 28 Feb 2019 01:30:28 +0300

By "maximum write iops of an osd" I mean total iops divided by the number  
of OSDs. For example, an expensive setup from Micron  
(https://www.micron.com/about/blog/2018/april/micron-9200-max-red-hat-ceph-storage-30-reference-architecture-block-performance)  
has got only 8750 peak write iops per an NVMe. These exact NVMes they used  
are rated for 260000+ iops when connected directly :). CPU is a real  
bottleneck. The need for a Seastar-based rewrite is not a joke! :)

Total iops is the number coming from a test like:

fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=128 -rw=randwrite  
-pool=<your_pool> -runtime=60 -rbdname=testimg

...or from several such jobs run in parallel each over a separate RBD  
image.

This is a "random write bandwidth" test, and, in fact, it's not the most  
useful one - the single-thread latency usually does matter more than just  
total bandwidth. To test for it, run:

fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite  
-pool=<your_pool> -runtime=60 -rbdname=testimg

You'll get a pretty low number (< 100 for HDD clusters, 500-1000 for SSD  
clusters). It's as expected that it's low. Everything above 1000 iops (<  
1ms latency, single-thread iops = 1 / avg latency) is hard to achieve with  
Ceph no matter what disks you're using. Also single-thread latency does  
not depend on the number of OSDs in the cluster, because the workload is  
not parallel.

However you can also test iops of single OSDs by creating a pool with  
size=1 and using a custom benchmark tool we've made with our colleagues  
from a russian Ceph chat... we can publish it here a short time later if  
you want :).

At some point I would expect the cpu to be the bottleneck. They have
always been saying this here for better latency get fast cpu's.
Would be nice to know what GHz you are testing, and how that scales. Rep
1-3, erasure propably also takes a hit.
How do you test maximum iops of the osd? (Just curious, so I can test
mine)

I have posted here a while ago a cephfs test on ssd rep 1. that was
performing nowhere near native, asking if this was normal. But never got
a response to it. I can remember that they send everyone a questionaire
and asked if they should focus on performance more, now I wished I
checked that box ;)

--
With best regards,
  Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com