Hi Mark, Greg and Kyle, Sorry to response this late, and thanks for providing the directions for me to look at. We have exact the same setup for OSD, pool replica (and even I tried to create the same number of PGs within the small cluster), however, I can still reproduce this constantly. This is the command I run: $ rados bench -p perf_40k_PG -b 5000 -t 3 --show-time 10 write With 24 OSDs: Average Latency: 0.00494123 Max latency: 0.511864 Min latency: 0.002198 With 330 OSDs: Average Latency: 0.00913806 Max latency: 0.021967 Min latency: 0.005456 In terms of the crush rule, we are using the default one, for the small cluster, it has 3 OSD hosts (11 + 11 + 2), for the large cluster, we have 30 OSD hosts (11 * 30). I have a couple of questions: 1. Is it possible that latency is due to that we have only three layer hierarchy? like root -> host -> OSD, and as we are using the Straw (by default) bucket type, which has O(N) speed, and if host number increase, so that the computation actually increase. I suspect not as the computation is in the order of microseconds per my understanding. 2. Is it possible because we have more OSDs, the cluster will need to maintain far more connections between OSDs which potentially slow things down? 3. Anything else i might miss? Thanks all for the constant help. Guang 在 2013-10-22,下午10:22,Guang Yang <yguang11@xxxxxxxxx> 写道:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com