SSD pool write performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

I'm testing small CEPH pool consists of some SSD drives (without any spinners). Ceph version is 0.67.4. Seems like write performance of this configuration is not so good as possible, when I testing it with small block size (4k).

Pool configuration:
2 mons on separated hosts, one host with two OSD. First partition of each disk is used for journal and has 20Gb size, second is formatted as XFS and used for data (mount options: rw,noexec,nodev,noatime,nodiratime,inode64). 20% of space left unformatted. Journal aio and dio turned on.

Each disk has about 15k IOPS with 4k blocks, iodepth 1 and 50k IOPS with 4k block, iodepth 16 (tested with fio). Linear throughput of disks is about 420Mb/s. Network throughput is 1Gbit/s.

I use rbd pool with size 1 and want this pool to act like RAID0 at this time.

Virtual machine (QEMU/KVM) on separated host is configured to use 100Gb RBD as second disk. Fio running in this machine (iodepth 16, buffered=0, direct=1, libaio, 4k randwrite) shows about 2.5-3k IOPS. Multiple quests with the same configuration shows similar summary result. Local kernel RBD on host with OSD also shows about 2-2.5k IOPS. Latency is about 7ms. I also tried to pre-fill RBD without any results.

Atop shows about 90% disks utilization during tests. CPU utilization is about 400% (2x Xeon E5504 is installed on ceph node). There is a lot of free memory on host. Blktrace shows that about 4k operations (4k to about 40k bytes) completing every second on every disk. OSD throughput is about 30 MB/s.

I expected to see about 2 x 50k/4 = 20-30k IOPS on RBD, so is that too optimistic for CEPH with such load or if I missed up something important? I also tried to use one disk as journal (20GB, last space left unformatted) and configure the next disk as OSD, this configuration have shown almost the same result.

Playing with some osd/filestore/journal options with admin socket ended with no result.

Please, tell me am I wrong with this setup? Or should I use more disks to get better performance with small concurrent writes? Or is ceph optimized for work with slow spinners and shouldn't be used with SSD disk only?
Thank you very much in advance!

My ceph configuration:
ceph.conf ==========================================================================
[global]

  auth cluster required = none
  auth service required = none
  auth client required = none

[client]

  rbd cache = true
  rbd cache max dirty = 0

[osd]

  osd journal aio = true
  osd max backfills = 4
  osd recovery max active = 1
  filestore max sync interval = 5

[mon.1]

  host = ceph1
  mon addr = 10.10.0.1:6789

[mon.2]

host = ceph2
mon addr = 10.10.0.2:6789

[osd.72]
  host = ceph7
  devs = /dev/sdd2
  osd journal = /dev/sdd1

[osd.73]
  host = ceph7
  devs = /dev/sde2
  osd journal = /dev/sde1

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux