Re: Ceph luminous performance - how to calculate expected results

Maged Mokhtar <mmokhtar@xxxxxxxxxxx> · Thu, 15 Feb 2018 01:17:00 +0200

On 2018-02-14 20:14, Steven Vacaroaia wrote:

Hi,

It is very useful to "set up expectations" from a performance perspective 

I have a  cluster using 3 DELL R620 with 64 GB RAM and  10 GB cluster network 

I've seen numerous posts and articles about the topic mentioning the following formula 
( for disks WAL/DB on it )

OSD / replication / 2

Example 
My HDD are capable of 150 MB/s
If I have 6 OSDs, expected throughput should be aroung 250 MB/s  for a pool withe replication =2 
( 150 x 6 / 2 / 2 )

How would one asses the impact of using SSD for WAL/DB i.e what performance gains should I expect ?
Example: 
  adding an 500MB/s SDD for every 2 HDD

Should I expect that kind of throuput on the client ( e.g Windows VM running on datastore create on RBD image shared via iSCSI ) ?

The reason I am asking is that despite rados bench meeting the expectaion, local performance test are 4 times worse

rados bench -p rbd 120 write --no-cleanup && rados bench -p rbd  120 seq

Total time run:         120.813979
Total writes made:      6182
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     204.678
Stddev Bandwidth:       36.2292
Max bandwidth (MB/sec): 280
Min bandwidth (MB/sec): 44
Average IOPS:           51
Stddev IOPS:            9
Max IOPS:               70
Min IOPS:               11
Average Latency(s):     0.312613
Stddev Latency(s):      0.524001
Max latency(s):         2.61579
Min latency(s):         0.0113714

Total time run:       113.850422
Total reads made:     6182
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   217.197
Average IOPS:         54
Stddev IOPS:          7
Max IOPS:             80
Min IOPS:             31
Average Latency(s):   0.293956
Max latency(s):       1.99958
Min latency(s):       0.0192862

Local test using CrystalDiskMark 

57 MB/s seq read
43 MB/s seq write

_______________________________________________
 ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi Steven,
I do not believe there is formula as there are many factors involved. The "OSD / replication / 2" is probably related to the theoretical peak of a filestore based OSD with collocated journal, practically this does not mean you will reach this due to other factors. If you have a link with performance formulas, it would be interesting to know.
For your test, i would check:
The rados benchmark default values is using 4M objects and 16 threads. You need to set your CrystalDiskMark with similar parameters.
The iSCSI target gateway should easily pull the rados throughput you are seeing without too much drop, double check how your client initiator and target are configured. You can also run atop or other performance tool on your iSCSI gateway and see if you have any resource issues.
Maged

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com