Generally you can measure your bottleneck via a tool like atop/collectl/sysstat and see how busy (ie %busy, %util ) your resources are: cpu/disks/net.
As was pointed out, in your case you will most probably have maxed out on your disks. But the above tools should help as you grow and tune your cluster.
Hi everybody,
I want to squeeze all the performance of CEPH (we are using jewel 10.2.7).
We are testing a testing environment with 2 nodes having the same configuration:
- CentOS 7.3
- 24 CPUs (12 for real in hyper threading)
- 32Gb of RAM
- 2x 100Gbit/s ethernet cards
- 2x OS dedicated in raid SSD Disks
- 4x OSD SSD Disks SATA 6Gbit/s
We are already expecting the following bottlenecks:
- [ SATA speed x n° disks ] = 24Gbit/s
- [ Networks speed x n° bonded cards ] = 200Gbit/s
So the minimum between them is 24 Gbit/s per node (not taking in account protocol loss).
24Gbit/s per node x2 = 48Gbit/s of maximum hypotetical theorical gross speed.
Here are the tests:
///////IPERF2/////// Tests are quite good scoring 88% of the bottleneck.
Note: iperf2 can use only 1 connection from a bond.(it's a well know issue).
[ ID] Interval Transfer Bandwidth
[ 12] 0.0-10.0 sec 9.55 GBytes 8.21 Gbits/sec
[ 3] 0.0-10.0 sec 10.3 GBytes 8.81 Gbits/sec
[ 5] 0.0-10.0 sec 9.54 GBytes 8.19 Gbits/sec
[ 7] 0.0-10.0 sec 9.52 GBytes 8.18 Gbits/sec
[ 6] 0.0-10.0 sec 9.96 GBytes 8.56 Gbits/sec
[ 8] 0.0-10.0 sec 12.1 GBytes 10.4 Gbits/sec
[ 9] 0.0-10.0 sec 12.3 GBytes 10.6 Gbits/sec
[ 10] 0.0-10.0 sec 10.2 GBytes 8.80 Gbits/sec
[ 11] 0.0-10.0 sec 9.34 GBytes 8.02 Gbits/sec
[ 4] 0.0-10.0 sec 10.3 GBytes 8.82 Gbits/sec
[SUM] 0.0-10.0 sec 103 GBytes 88.6 Gbits/sec
///////RADOS BENCH
Take in consideration the maximum hypotetical speed of 48Gbit/s tests (due to disks bottleneck), tests are not good enought.
- Average MB/s in write is almost 5-7Gbit/sec (12,5% of the mhs)
- Average MB/s in seq read is almost 24Gbit/sec (50% of the mhs)
- Average MB/s in random read is almost 27Gbit/se (56,25% of the mhs).
Here are the reports.
Write:
# rados bench -p scbench 10 write --no-cleanup
Total time run: 10.229369
Total writes made: 1538
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 601.406
Stddev Bandwidth: 357.012
Max bandwidth (MB/sec): 1080
Min bandwidth (MB/sec): 204
Average IOPS: 150
Stddev IOPS: 89
Max IOPS: 270
Min IOPS: 51
Average Latency(s): 0.106218
Stddev Latency(s): 0.198735
Max latency(s): 1.87401
Min latency(s): 0.0225438
sequential read:
# rados bench -p scbench 10 seq
Total time run: 2.054359
Total reads made: 1538
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2994.61
Average IOPS 748
Stddev IOPS: 67
Max IOPS: 802
Min IOPS: 707
Average Latency(s): 0.0202177
Max latency(s): 0.223319
Min latency(s): 0.00589238
random read:
# rados bench -p scbench 10 rand
Total time run: 10.036816
Total reads made: 8375
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 3337.71
Average IOPS: 834
Stddev IOPS: 78
Max IOPS: 927
Min IOPS: 741
Average Latency(s): 0.0182707
Max latency(s): 0.257397
Min latency(s): 0.00469212
//------------------------------------
It's seems like that there are some bottleneck somewhere that we are understimating.
Can you help me to found it?