Re: SSD pool write performance

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 11 Oct 2013 08:55:38 -0700

On Thu, Oct 10, 2013 at 12:47 PM, Sergey Pimkov <sergey.pimkov@xxxxxxxxx> wrote:
> Hello!
>
> I'm testing small CEPH pool consists of some SSD drives (without any
> spinners).  Ceph version is 0.67.4. Seems like write performance of this
> configuration is not so good as possible, when I testing it with small block
> size (4k).
>
> Pool configuration:
> 2 mons on separated hosts, one host with two OSD. First partition of each
> disk is used for journal and has 20Gb size, second is formatted as XFS and
> used for data (mount options: rw,noexec,nodev,noatime,nodiratime,inode64).
> 20% of space left unformatted. Journal aio and dio turned on.
>
> Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS with 4k
> block, iodepth 16 (tested with fio). Linear throughput of disks is about
> 420Mb/s. Network throughput is 1Gbit/s.
>
> I use rbd pool with size 1 and want this pool to act like RAID0 at this
> time.
>
> Virtual machine (QEMU/KVM) on separated host is configured to use 100Gb RBD
> as second disk. Fio running in this machine (iodepth 16, buffered=0,
> direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS.
> Multiple quests with the same configuration shows similar summary result.
> Local kernel RBD on host with OSD also shows about 2-2.5k IOPS. Latency is
> about 7ms.

You need to figure out where this is coming from. The OSD does have
some internal queueing that can add up to a millisecond or so of
latency, but 7ms of latency is far more than you should be getting on
an SSD.

You also aren't putting enough concurrency on the disks — with 16
in-flight ops against two disks, that's 8 each, plus you're traversing
the network so it looks a lot more like 1 IO queued than 16 to the
SSD.

All that said, Ceph is a distributed storage system that is respecting
the durability constraints you give it — you aren't going to get IOP
numbers matching a good local SSD without a big investment.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

> I also tried to pre-fill RBD without any results.
>
> Atop shows about 90% disks utilization during tests. CPU utilization is
> about 400% (2x Xeon E5504 is installed on ceph node). There is a lot of free
> memory on host. Blktrace shows that about 4k operations (4k to about 40k
> bytes) completing every second on every disk. OSD throughput is about 30
> MB/s.
>
> I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that too
> optimistic for CEPH with such load or if I missed up something important?
> I also tried to use one disk as journal (20GB, last space left unformatted)
> and configure the next disk as OSD, this configuration have shown almost the
> same result.
>
> Playing with some osd/filestore/journal options with admin socket ended with
> no result.
>
> Please, tell me am I wrong with this setup? Or should I use more disks to
> get better performance with small concurrent writes? Or is ceph optimized
> for work with slow spinners and shouldn't be used with SSD disk only?
> Thank you very much in advance!
>
> My ceph configuration:
> ceph.conf
> ==========================================================================
> [global]
>
>   auth cluster required = none
>   auth service required = none
>   auth client required = none
>
> [client]
>
>   rbd cache = true
>   rbd cache max dirty = 0
>
> [osd]
>
>   osd journal aio = true
>   osd max backfills = 4
>   osd recovery max active = 1
>   filestore max sync interval = 5
>
> [mon.1]
>
>   host = ceph1
>   mon addr = 10.10.0.1:6789
>
> [mon.2]
>
> host = ceph2
> mon addr = 10.10.0.2:6789
>
> [osd.72]
>   host = ceph7
>   devs = /dev/sdd2
>   osd journal = /dev/sdd1
>
> [osd.73]
>   host = ceph7
>   devs = /dev/sde2
>   osd journal = /dev/sde1
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com