On 10/21/2013 09:13 AM, Guang Yang wrote:
Dear ceph-users,
Hi!
Recently I deployed a ceph cluster with RadosGW, from a small one (24 OSDs) to a much bigger one (330 OSDs). When using rados bench to test the small cluster (24 OSDs), it showed the average latency was around 3ms (object size is 5K), while for the larger one (330 OSDs), the average latency was around 7ms (object size 5K), twice comparing the small cluster.
Did you have the same number of concurrent requests going?
The OSD within the two cluster have the same configuration, SAS disk, and two partitions for one disk, one for journal and the other for metadata. For PG numbers, the small cluster tested with the pool having 100 PGs, and for the large cluster, the pool has 43333 PGs (as I will to further scale the cluster, so I choose a much large PG).
Forgive me if this is a silly question, but were the pools using the same level of replication?
Does my test result make sense? Like when the PG number and OSD increase, the latency might drop?
You wouldn't necessarily expect a larger cluster to show higher latency if the nodes, pools, etc were all configured exactly the same, especially if you were using the same amount of concurrency. It's possible that you have some slow drives on the larger cluster that could be causing the average latency to increase. If there are more disks per node, that could do it too.
Are there any other differences you can think of?
Thanks, Guang _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com