I would strongly consider your journaling setup, (you do mention that you will revisit this) but we have found that co-locating journals does impact performance
and usually separating them on flash is a good idea. Also not sure of your networking setup which can also have significant impact. From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Sergio A. de Carvalho Jr. Hi all, I've setup a testing/development Ceph cluster consisting of 5 Dell PowerEdge R720xd servers (256GB RAM, 2x 8-core Xeon E5-2650 @ 2.60 GHz, dual-port 10Gb Ethernet, 2x 900GB + 12x 4TB disks) running CentOS 6.5 and Ceph Hammer 0.94.6. All
servers use one 900GB disk for the root partition and the other 13 disks are assigned to OSDs, so we have 5 x 13 = 65 OSDs in total. We also run 1 monitor on every host. Journals are 5GB partitions on each disk (this is something we obviously will need to
revisit later). The purpose of this cluster will be to serve as a backend storage for Cinder volumes and Glance images in an OpenStack cloud. With this setup, I'm getting what I'm considering an "okay" performance: # rados -p images bench 5 write Maintaining 16 concurrent writes of 4194304 bytes for up to 5 seconds or 0 objects Total writes made: 394 Write size: 4194304 Bandwidth (MB/sec): 299.968 Stddev Bandwidth: 127.334 Max bandwidth (MB/sec): 348 Min bandwidth (MB/sec): 0 Average Latency: 0.212524 Stddev Latency: 0.13317 Max latency: 0.828946 Min latency: 0.0707341 Does that look acceptable? How much more can I expect to achieve by fine-tunning and perhaps using a more efficient setup? I do understand the bandwidth above is a product of running 16 concurrent writes, and rather small object sizes (4MB). Bandwidth lowers significantly with 64MB and 1 thread: # rados -p images bench 5 write -b 67108864 -t 1 Maintaining 1 concurrent writes of 67108864 bytes for up to 5 seconds or 0 objects Total writes made: 7 Write size: 67108864 Bandwidth (MB/sec): 71.520 Stddev Bandwidth: 24.1897 Max bandwidth (MB/sec): 64 Min bandwidth (MB/sec): 0 Average Latency: 0.894792 Stddev Latency: 0.0547502 Max latency: 0.99311 Min latency: 0.832765 Is such a drop expected? Now, what I'm really concerned is about upload times. Uploading a randomly-generated 1GB file takes a bit too long: # time rados -p images put random_1GB /tmp/random_1GB real 0m35.328s user 0m0.560s sys 0m3.665s Is this normal? If so, if I setup this cluster as a backend for Glance, does that mean uploading a 1GB image will require 35 seconds (plus whatever time Glance requires to do its own thing)? Thanks, Sergio |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com