Hello, [reduced to ceph-users] On Sat, 27 Sep 2014 19:17:22 +0400 Timur Nurlygayanov wrote: > Hello all, > > I installed OpenStack with Glance + Ceph OSD with replication factor 2 > and now I can see the write operations are extremly slow. > For example, I can see only 0.04 MB/s write speed when I run rados bench > with 512b blocks: > > rados bench -p test 60 write --no-cleanup -t 1 -b 512 > There are 2 things wrong with that this test: 1. You're using rados bench, when in fact you should be testing from within VMs. For starters a VM could make use of the rbd cache you enabled, rados bench won't. 2. Given the parameters of this test you're testing network latency more than anything else. If you monitor the Ceph nodes (atop is a good tool for that), you will probably see that neither CPU nor disks resources are being exhausted. With a single thread rados puts that tiny block of 512 bytes on the wire, the primary OSD for the PG has to write this to the journal (on your slow, non-SSD disks) and send it to the secondary OSD, which has to ACK the write to its journal back to the primary one, which in turn then ACKs it to the client (rados bench) and then rados bench can send the next packet. You get the drift. Using your parameters I can get 0.17MB/s on a pre-production cluster that uses 4xQDR Infiniband (IPoIB) connections, on my shitty test cluster with 1GB/s links I get similar results to you, unsurprisingly. Ceph excels only with lots of parallelism, so an individual thread might be slow (and in your case HAS to be slow, which has nothing to do with Ceph per se) but many parallel ones will utilize the resources available. Having data blocks that are adequately sized (4MB, the default rados size) will help for bandwidth and the rbd cache inside a properly configured VM should make that happen. Of course in most real life scenarios you will run out of IOPS long before you run out of bandwidth. > Maintaining 1 concurrent writes of 512 bytes for up to 60 seconds or 0 > objects > Object prefix: benchmark_data_node-17.domain.tld_15862 > sec Cur ops started finished avg MB/s cur MB/s last > lat avg lat > 0 0 0 0 0 > 0 - 0 > 1 1 83 82 0.0400341 0.0400391 > 0.008465 0.0120985 > 2 1 169 168 0.0410111 0.0419922 > 0.080433 0.0118995 > 3 1 240 239 0.0388959 0.034668 > 0.008052 0.0125385 > 4 1 356 355 0.0433309 0.0566406 > 0.00837 0.0112662 > 5 1 472 471 0.0459919 0.0566406 > 0.008343 0.0106034 > 6 1 550 549 0.0446735 0.0380859 > 0.036639 0.0108791 > 7 1 581 580 0.0404538 0.0151367 > 0.008614 0.0120654 > > > My test environment configuration: > Hardware servers with 1Gb network interfaces, 64Gb RAM and 16 CPU cores > per node, HDDs WDC WD5003ABYX-01WERA0. For anything production, consider faster network connections and SSD journals. > OpenStack with 1 controller, 1 compute and 2 ceph nodes (ceph on separate > nodes). > CentOS 6.5, kernel 2.6.32-431.el6.x86_64. > You will probably want a 3.14 or 3.16 kernel for various reasons. Regards, Christian > I tested several config options for optimizations, like in > /etc/ceph/ceph.conf: > > [default] > ... > osd_pool_default_pg_num = 1024 > osd_pool_default_pgp_num = 1024 > osd_pool_default_flag_hashpspool = true > ... > [osd] > osd recovery max active = 1 > osd max backfills = 1 > filestore max sync interval = 30 > filestore min sync interval = 29 > filestore flusher = false > filestore queue max ops = 10000 > filestore op threads = 16 > osd op threads = 16 > ... > [client] > rbd_cache = true > rbd_cache_writethrough_until_flush = true > > and in /etc/cinder/cinder.conf: > > [DEFAULT] > volume_tmp_dir=/tmp > > but in the result performance was increased only on ~30 % and it not > looks like huge success. > > Non-default mount options and TCP optimization increase the speed in > about 1%: > > [root at node-17 ~]# mount | grep ceph > /dev/sda4 on /var/lib/ceph/osd/ceph-0 type xfs > (rw,noexec,nodev,noatime,nodiratime,user_xattr,data=writeback,barrier=0) > > [root at node-17 ~]# cat /etc/sysctl.conf > net.core.rmem_max = 16777216 > net.core.wmem_max = 16777216 > net.ipv4.tcp_rmem = 4096 87380 16777216 > net.ipv4.tcp_wmem = 4096 65536 16777216 > net.ipv4.tcp_window_scaling = 1 > net.ipv4.tcp_timestamps = 1 > net.ipv4.tcp_sack = 1 > > > Do we have other ways to significantly improve CEPH storage performance? > Any feedback and comments are welcome! > > Thank you! > > -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/