Hello, I'm not a "storage-guy" so please excuse me if I'm missing / overlooking something obvious. My question is in the area "what kind of performance am I to expect with this setup". We have bought servers, disks and networking for our future ceph-cluster and are now in our "testing-phase" and I simply want to understand if our numbers line up, or if we are missing something obvious. Background, - cephmon1, DELL R730, 1 x E5-2643, 64 GB - cephosd1-6, DELL R730, 1 x E5-2697, 64 GB - each server is connected to a dedicated 50 Gbe network, with Mellanox-4 Lx cards (teamed into one interface, team0). In our test we only have one monitor. This will of course not be the case later on. Each OSD, has the following SSD's configured as pass-through (not raid 0 through the raid-controller), - 2 x Dell 1.6TB 2.5" SATA MLC MU 6Gbs SSD (THNSF81D60CSE), only spec I can find on Dell's homepage says "Data Transfer Rate 600 Mbps" - 4 x Intel SSD DC S3700 (https://ark.intel.com/products/71916/Intel-SS D-DC-S3700-Series-800GB-2_5in-SATA-6Gbs-25nm-MLC) - 3 HDD's, which is uninteresting here. At the moment I'm only interested in the performance of the SSD-pool. Ceph-cluster is created with ceph-ansible with "default params" (ie. have not added / changed anything except the necessary). When ceph-cluster is up, we have 54 OSD's (36 SSD, 18HDD). The min_size is 3 on the pool. Rules are created as follows, $ > ceph osd crush rule create-replicated ssd-rule default host ssd $ > ceph osd crush rule create-replicated hdd-rule default host hdd Testing is done on a separate node (same nic and network though), $ > ceph osd pool create ssd-bench 512 512 replicated ssd-rule $ > ceph osd pool application enable ssd-bench rbd $ > rbd create ssd-image --size 1T --pool ssd-pool $ > rbd map ssd-image --pool ssd-bench $ > mkfs.xfs /dev/rbd/ssd-bench/ssd-image $ > mount /dev/rbd/ssd-bench/ssd-image /ssd-bench Fio is then run like this, $ > actions="read randread write randwrite" blocksizes="4k 128k 8m" tmp_dir="/tmp/" for blocksize in ${blocksizes}; do for action in ${actions}; do rm -f ${tmp_dir}${action}_${blocksize}_${suffix} fio --directory=/ssd-bench \ --time_based \ --direct=1 \ --rw=${action} \ --bs=$blocksize \ --size=1G \ --numjobs=100 \ --runtime=120 \ --group_reporting \ --name=testfile \ --output=${tmp_dir}${action}_${blocksize}_${suffix} done done After running this, we end up with these numbers read_4k iops : 159266 throughput : 622 MB / sec randread_4k iops : 151887 throughput : 593 MB / sec read_128k iops : 31705 throughput : 3963.3 MB / sec randread_128k iops : 31664 throughput : 3958.5 MB / sec read_8m iops : 470 throughput : 3765.5 MB / sec randread_8m iops : 463 throughput : 3705.4 MB / sec write_4k iops : 50486 throughput : 197 MB / sec randwrite_4k iops : 42491 throughput : 165 MB / sec write_128k iops : 15907 throughput : 1988.5 MB / sec randwrite_128k iops : 15558 throughput : 1944.9 MB / sec write_8m iops : 347 throughput : 2781.2 MB / sec randwrite _8m iops : 347 throughput : 2777.2 MB / sec Ok, if you read all way here, the million dollar question is of course if the numbers above are in the ballpark of what to expect, or if they are low. The main reason I'm a bit uncertain on the numbers above are, and this may sound fuzzy but, because we did POC a couple of months ago with (if I remember the configuration correctly, unfortunately we only saved the numbers, not the *exact* configuration *sigh* (networking still the same though)) with fewer OSD's and those numbers were read 4k iops : 282303 throughput : 1102.8 MB / sec (b) randread 4k iops : 253453 throughput : 990.52 MB / sec (b) read 128k iops : 31298 throughput : 3912 MB / sec (w) randread 128k iops : 9013 throughput : 1126.8 MB / sec (w) read 8m iops : 405 throughput : 3241.4 MB / sec (w) randread 8m iops : 369 throughput : 2957.8 MB / sec (w) write 4k iops : 80644 throughput : 315 MB / sec (b) randwrite 4k iops : 53178 throughput : 207 MB / sec (b) write 128k iops : 17126 throughput : 2140.8 MB / sec (b) randwrite 128k iops : 11654 throughput : 2015.9 MB / sec (b) write 8m iops : 258 throughput : 2067.1 MB / sec (w) randwrite 8m iops : 251 throughput : 1456.9 MB / sec (w) Where (b) is higher number and (w) is lower. What I would expect since adding more OSD's was an increase on *all* numbers. The read_4k_ throughput and iops number in current setup is not even close to the POC which makes me wonder if these "new" numbers are what they "are suppose to" or if I'm missing something obvious. Ehm, in this new setup we are running with MTU 1500, I think we had the POC to 9000, but the difference on the read_4k is roughly 400 MB/sec and I wonder if the MTU will make up for that. Is the above a good way of measuring our cluster, or is it better more reliable ways of measuring it ? Is there a way to calculate this "theoretically" (ie with with 6 nodes and 36 SSD's we should get these numbers) and then compare it to the reality. Again, not a storage guy and haven't really done this before so please excuse me for my laymen terms. Thanks for Ceph and keep up the awesome work! Best regards, Patrik Martinsson Sweden _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com