Re: Rados bench result when increasing OSDs

Guang Yang <yguang11@xxxxxxxxx> · Thu, 24 Oct 2013 21:31:49 +0800

Hi Mark, Greg and Kyle,
Sorry to response this late, and thanks for providing the directions for me to look at.

We have exact the same setup for OSD, pool replica (and even I tried to create the same number of PGs within the small cluster), however, I can still reproduce this constantly.

This is the command I run:
$ rados bench -p perf_40k_PG -b 5000 -t 3 --show-time 10 write

With 24 OSDs:
Average Latency: 0.00494123
Max latency:     0.511864
Min latency:      0.002198

With 330 OSDs:
Average Latency:    0.00913806
Max latency:             0.021967
Min latency:              0.005456

In terms of the crush rule, we are using the default one, for the small cluster, it has 3 OSD hosts (11 + 11 + 2), for the large cluster, we have 30 OSD hosts (11 * 30).

I have a couple of questions:
 1. Is it possible that latency is due to that we have only three layer hierarchy? like root -> host -> OSD, and as we are using the Straw (by default) bucket type, which has O(N) speed, and if host number increase, so that the computation actually increase. I suspect not as the computation is in the order of microseconds per my understanding.

 2. Is it possible because we have more OSDs, the cluster will need to maintain far more connections between OSDs which potentially slow things down?

 3. Anything else i might miss?

Thanks all for the constant help.

Guang  

在 2013-10-22，下午10:22，Guang Yang <yguang11@xxxxxxxxx> 写道：

Hi Kyle and Greg,I will get back to you with more details tomorrow, thanks for the response.

Thanks,
Guang
在 2013-10-22，上午9:37，Kyle Bader <kyle.bader@xxxxxxxxx> 写道：

Besides what Mark and Greg said it could be due to additional hops through network devices. What network devices are you using, what is the network  topology and does your CRUSH map reflect the network topology?

On Oct 21, 2013 9:43 AM, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote:

On Mon, Oct 21, 2013 at 7:13 AM, Guang Yang <yguang11@xxxxxxxxx> wrote:

> Dear ceph-users,

> Recently I deployed a ceph cluster with RadosGW, from a small one (24 OSDs) to a much bigger one (330 OSDs).

>

> When using rados bench to test the small cluster (24 OSDs), it showed the average latency was around 3ms (object size is 5K), while for the larger one (330 OSDs), the average latency was around 7ms (object size 5K), twice comparing the small cluster.

>

> The OSD within the two cluster have the same configuration, SAS disk,  and two partitions for one disk, one for journal and the other for metadata.

>

> For PG numbers, the small cluster tested with the pool having 100 PGs, and for the large cluster, the pool has 43333 PGs (as I will to further scale the cluster, so I choose a much large PG).

>

> Does my test result make sense? Like when the PG number and OSD increase, the latency might drop?

Besides what Mark said, can you describe your test in a little more

detail? Writing/reading, length of time, number of objects, etc.

-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com