It depends on what stage you are in: Before production you should first make sure your SSDs are suitable for Ceph, either by being recommend by other Ceph users or you test them yourself for sync writes performance using fio tool as outlined earlier. Then after you build your cluster you can use rados and/or rbd bencmark tests to benchmark your cluster and find bottlenecks using atop/sar/collectl which will help you tune your cluster. Though you did see better improvements, your cluster with 27 SSDs should give much higher numbers than 3k iops. If you are running rados bench while you have other client ios, then obviously the reported number by the tool will be less than what the cluster is actually giving...which you can find out via ceph status command, it will print the total cluster throughput and iops. If the total is still low i would recommend running the fio raw disk test, maybe the disks are not suitable. When you removed your 9 bad disk from 36 and your performance doubled, you still had 2 other disk slowing you..meaning near 100% busy ? It makes me feel the disk type used is not good. For these near 100% busy disks can you also measure their raw disk iops at that load (i am not sure atop shows this, if not use sat/syssyat/iostat/collecl). Maged On 2017-10-25 23:44, Russell Glaue wrote:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com