Re: SSD pool write performance

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Fri, 11 Oct 2013 08:25:09 -0500

On 10/10/2013 02:47 PM, Sergey Pimkov wrote:
Hello!

I'm testing small CEPH pool consists of some SSD drives (without any
spinners).  Ceph version is 0.67.4. Seems like write performance of this
configuration is not so good as possible, when I testing it with small
block size (4k).

Pool configuration:
2 mons on separated hosts, one host with two OSD. First partition of
each disk is used for journal and has 20Gb size, second is formatted as
XFS and used for data (mount options:
rw,noexec,nodev,noatime,nodiratime,inode64). 20% of space left
unformatted. Journal aio and dio turned on.

Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS with
4k block, iodepth 16 (tested with fio). Linear throughput of disks is
about 420Mb/s. Network throughput is 1Gbit/s.

I use rbd pool with size 1 and want this pool to act like RAID0 at this
time.

Virtual machine (QEMU/KVM) on separated host is configured to use 100Gb
RBD as second disk. Fio running in this machine (iodepth 16, buffered=0,
direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS.
Multiple quests with the same configuration shows similar summary
result. Local kernel RBD on host with OSD also shows about 2-2.5k IOPS.
Latency is about 7ms. I also tried to pre-fill RBD without any results.

Atop shows about 90% disks utilization during tests. CPU utilization is
about 400% (2x Xeon E5504 is installed on ceph node). There is a lot of
free memory on host. Blktrace shows that about 4k operations (4k to
about 40k bytes) completing every second on every disk. OSD throughput
is about 30 MB/s.

Hi!  First thing to try is disabling all in-memory dubugging.  Not sure 
how much it will help, but it should give you something.

        debug asok = 0/0
        debug auth = 0/0
        debug buffer = 0/0
        debug client = 0/0
        debug context = 0/0
        debug crush = 0/0
        debug filer = 0/0
        debug filestore = 0/0
        debug finisher = 0/0
        debug hadoop = 0/0
        debug heartbeatmap = 0/0
        debug journal = 0/0
        debug journaler = 0/0
        debug lockdep = 0/0
        debug mds = 0/0
        debug mds balancer = 0/0
        debug mds locker = 0/0
        debug mds log = 0/0
        debug mds log expire = 0/0
        debug mds migrator = 0/0
        debug mon = 0/0
        debug monc = 0/0
        debug ms = 0/0
        debug objclass = 0/0
        debug objectcacher = 0/0
        debug objecter = 0/0
        debug optracker = 0/0
        debug osd = 0/0
        debug paxos = 0/0
        debug perfcounter = 0/0
        debug rados = 0/0
        debug rbd = 0/0
        debug rgw = 0/0
        debug throttle = 0/0
        debug timer = 0/0
        debug tp = 0/0

I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that too
optimistic for CEPH with such load or if I missed up something important?
I also tried to use one disk as journal (20GB, last space left
unformatted) and configure the next disk as OSD, this configuration have
shown almost the same result.

Playing with some osd/filestore/journal options with admin socket ended
with no result.

Please, tell me am I wrong with this setup? Or should I use more disks
to get better performance with small concurrent writes? Or is ceph
optimized for work with slow spinners and shouldn't be used with SSD
disk only?

We definitely have some things to work on for small IO performance.  I 
suspect some of the changes we'll be making over the coming months 
should help.

Thank you very much in advance!

My ceph configuration:
ceph.conf
==========================================================================
[global]

   auth cluster required = none
   auth service required = none
   auth client required = none

[client]

   rbd cache = true
   rbd cache max dirty = 0

[osd]

   osd journal aio = true
   osd max backfills = 4
   osd recovery max active = 1
   filestore max sync interval = 5

[mon.1]

   host = ceph1
   mon addr = 10.10.0.1:6789

[mon.2]

host = ceph2
mon addr = 10.10.0.2:6789

[osd.72]
   host = ceph7
   devs = /dev/sdd2
   osd journal = /dev/sdd1

[osd.73]
   host = ceph7
   devs = /dev/sde2
   osd journal = /dev/sde1

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com