On Thu, Oct 10, 2013 at 12:47 PM, Sergey Pimkov <sergey.pimkov@xxxxxxxxx> wrote: > Hello! > > I'm testing small CEPH pool consists of some SSD drives (without any > spinners). Ceph version is 0.67.4. Seems like write performance of this > configuration is not so good as possible, when I testing it with small block > size (4k). > > Pool configuration: > 2 mons on separated hosts, one host with two OSD. First partition of each > disk is used for journal and has 20Gb size, second is formatted as XFS and > used for data (mount options: rw,noexec,nodev,noatime,nodiratime,inode64). > 20% of space left unformatted. Journal aio and dio turned on. > > Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS with 4k > block, iodepth 16 (tested with fio). Linear throughput of disks is about > 420Mb/s. Network throughput is 1Gbit/s. > > I use rbd pool with size 1 and want this pool to act like RAID0 at this > time. > > Virtual machine (QEMU/KVM) on separated host is configured to use 100Gb RBD > as second disk. Fio running in this machine (iodepth 16, buffered=0, > direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS. > Multiple quests with the same configuration shows similar summary result. > Local kernel RBD on host with OSD also shows about 2-2.5k IOPS. Latency is > about 7ms. You need to figure out where this is coming from. The OSD does have some internal queueing that can add up to a millisecond or so of latency, but 7ms of latency is far more than you should be getting on an SSD. You also aren't putting enough concurrency on the disks — with 16 in-flight ops against two disks, that's 8 each, plus you're traversing the network so it looks a lot more like 1 IO queued than 16 to the SSD. All that said, Ceph is a distributed storage system that is respecting the durability constraints you give it — you aren't going to get IOP numbers matching a good local SSD without a big investment. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > I also tried to pre-fill RBD without any results. > > Atop shows about 90% disks utilization during tests. CPU utilization is > about 400% (2x Xeon E5504 is installed on ceph node). There is a lot of free > memory on host. Blktrace shows that about 4k operations (4k to about 40k > bytes) completing every second on every disk. OSD throughput is about 30 > MB/s. > > I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that too > optimistic for CEPH with such load or if I missed up something important? > I also tried to use one disk as journal (20GB, last space left unformatted) > and configure the next disk as OSD, this configuration have shown almost the > same result. > > Playing with some osd/filestore/journal options with admin socket ended with > no result. > > Please, tell me am I wrong with this setup? Or should I use more disks to > get better performance with small concurrent writes? Or is ceph optimized > for work with slow spinners and shouldn't be used with SSD disk only? > Thank you very much in advance! > > My ceph configuration: > ceph.conf > ========================================================================== > [global] > > auth cluster required = none > auth service required = none > auth client required = none > > [client] > > rbd cache = true > rbd cache max dirty = 0 > > [osd] > > osd journal aio = true > osd max backfills = 4 > osd recovery max active = 1 > filestore max sync interval = 5 > > [mon.1] > > host = ceph1 > mon addr = 10.10.0.1:6789 > > [mon.2] > > host = ceph2 > mon addr = 10.10.0.2:6789 > > [osd.72] > host = ceph7 > devs = /dev/sdd2 > osd journal = /dev/sdd1 > > [osd.73] > host = ceph7 > devs = /dev/sde2 > osd journal = /dev/sde1 > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com