Re: SSD pool write performance

james@xxxxxxxxxxxx · Fri, 11 Oct 2013 17:03:34 +0100

Just a thought; did you try setting noop scheduler for the SSDs?

I guess the journal is written uncached (?)  So maybe sticking the SSDs 
behind BBWC might help by reducing write latency to near zero.  Also 
maybe wear rate might be lower on the SSD too (if journal IO straddles 
physical cells).

On 2013-10-11 16:55, Gregory Farnum wrote:
On Thu, Oct 10, 2013 at 12:47 PM, Sergey Pimkov 
<sergey.pimkov@xxxxxxxxx> wrote:
Hello!

I'm testing small CEPH pool consists of some SSD drives (without any
spinners).  Ceph version is 0.67.4. Seems like write performance of 
this
configuration is not so good as possible, when I testing it with 
small block
size (4k).

Pool configuration:
2 mons on separated hosts, one host with two OSD. First partition of 
each
disk is used for journal and has 20Gb size, second is formatted as 
XFS and
used for data (mount options: 
rw,noexec,nodev,noatime,nodiratime,inode64).
20% of space left unformatted. Journal aio and dio turned on.

Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS 
with 4k
block, iodepth 16 (tested with fio). Linear throughput of disks is 
about
420Mb/s. Network throughput is 1Gbit/s.

I use rbd pool with size 1 and want this pool to act like RAID0 at 
this
time.

Virtual machine (QEMU/KVM) on separated host is configured to use 
100Gb RBD
as second disk. Fio running in this machine (iodepth 16, buffered=0,
direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS.
Multiple quests with the same configuration shows similar summary 
result.
Local kernel RBD on host with OSD also shows about 2-2.5k IOPS. 
Latency is
about 7ms.

You need to figure out where this is coming from. The OSD does have
some internal queueing that can add up to a millisecond or so of
latency, but 7ms of latency is far more than you should be getting on
an SSD.

You also aren't putting enough concurrency on the disks — with 16
in-flight ops against two disks, that's 8 each, plus you're 
traversing
the network so it looks a lot more like 1 IO queued than 16 to the
SSD.

All that said, Ceph is a distributed storage system that is 
respecting
the durability constraints you give it — you aren't going to get IOP
numbers matching a good local SSD without a big investment.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

I also tried to pre-fill RBD without any results.

Atop shows about 90% disks utilization during tests. CPU utilization 
is
about 400% (2x Xeon E5504 is installed on ceph node). There is a lot 
of free
memory on host. Blktrace shows that about 4k operations (4k to about 
40k
bytes) completing every second on every disk. OSD throughput is 
about 30
MB/s.

I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that 
too
optimistic for CEPH with such load or if I missed up something 
important?
I also tried to use one disk as journal (20GB, last space left 
unformatted)
and configure the next disk as OSD, this configuration have shown 
almost the
same result.

Playing with some osd/filestore/journal options with admin socket 
ended with
no result.

Please, tell me am I wrong with this setup? Or should I use more 
disks to
get better performance with small concurrent writes? Or is ceph 
optimized
for work with slow spinners and shouldn't be used with SSD disk 
only?
Thank you very much in advance!

My ceph configuration:
ceph.conf

==========================================================================
[global]

  auth cluster required = none
  auth service required = none
  auth client required = none

[client]

  rbd cache = true
  rbd cache max dirty = 0

[osd]

  osd journal aio = true
  osd max backfills = 4
  osd recovery max active = 1
  filestore max sync interval = 5

[mon.1]

  host = ceph1
  mon addr = 10.10.0.1:6789

[mon.2]

host = ceph2
mon addr = 10.10.0.2:6789

[osd.72]
  host = ceph7
  devs = /dev/sdd2
  osd journal = /dev/sdd1

[osd.73]
  host = ceph7
  devs = /dev/sde2
  osd journal = /dev/sde1

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com