Dear all,
We currently run a small Ceph cluster on 2 machines and we wonder what
are the theoretical max BW/IOPS we can achieve through RBD with our setup.
Here are the environment details:
- The Ceph release is an octopus 15.2.1 running on Centos 8, both
machines have 180GB RAM, 72 cores, and 40 * 1.8TB SSD disks each
- Regarding network we deployed two isolated 100Gb/s networks for front
and back connectivity
- Since all disks have the same performance, we created 1 OSD per SSD
using bluestore (default setup with LVM) to reach a total of 80 OSDs (40
OSD per machine)
- On top of that we have a single 2x replicated RBD pool with 2048 PGs
in order to reach a global average of 50 PGs per OSD (our experiments
with 100 PGs/OSD didn't provided perfomance improvement, only extra CPU
consumption)
- We kept default settings for all RBD images we created for benchmarks
(4MB obj size, 4MB stripe width, 1 stripe)
- The crush map and replication rules used are very simple (2 hosts, 40
OSDs per host with same device class and weight)
- All tuning settings (caches sizing, op threads, bluestore, rocksdb
options, etc.) are the default options provided with the Octopus release.
Here are the best values observed so far using both rados bench and fio
with many different setup (varying amount of clients, threads, RBD
images, bloc sizes from 4k to 4m, random/sequential, iodepth, etc.):
- Read BW: 24GB/s (looks like we reached the maximum network capacity of
both machines here)
- Read IOPS: 600k
- Write BW: 7 GB/s
- Write IOPS: 100k
Those are simply the maximum numbers obtained regardless latency as we
first want to stress the infrastructure to see what are the maximum
thoughput & IOPS we can achieve. Latency care/measurements will come after.
We also have the feelings that the 2x replication of the RBD pool is a
big deal with only 2 nodes in the cluster, dividing maximum speeds by
more than 2. This will probably have much less impact when scaling up
the cluster with new nodes.
We also noticed that at some point during recovery operations (eg.
rebalancing PGs after new OSD was added into the pool) the total
read/write throughput and IOPS are climbing to several GB/s and millions
IOPS, so we wonder if we can achieve any better with legitimate RBD
clients load.
Do you guys would like to share numbers of your setup or have any hints
for potential improvements?
Thanks.
Regards,
--
Vincent Kherbache
R&D Director
Titan Datacenter
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx