Performance doesn't scale well on a full ssd cluster.

Mark Wu <wudx05@xxxxxxxxx> · Fri, 17 Oct 2014 00:18:22 +0800

Hi list,

During my test, I found ceph doesn't scale as I expected on a 30 osds cluster. 
The following is the information of my setup:
HW configuration:
   15 Dell R720 servers, and each server has:
      Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and hyper-thread enabled.
      128GB memory
      two Intel 3500 SSD disks, connected with MegaRAID SAS 2208 controller, each disk is configured as raid0 separately.
      bonding with two 10GbE nics, used for both the public network and cluster network.

SW configuration:
   OS CentOS 6.5, Kernel 3.17,  Ceph 0.86
   XFS as file system for data.
   each SSD disk has two partitions, one is osd data and the other is osd journal.
   the pool has 2048 pgs. 2 replicas. 
   5 monitors running on 5 of the 15 servers.
   Ceph configuration (in memory debugging options are disabled)

[osd]
osd data = "">osd journal = /var/lib/ceph/osd/$cluster-$id/journal
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = rw,noatime,logbsize=256k,delaylog
osd journal size = 20480
osd mon heartbeat interval = 30 # Performance tuning filestore
osd_max_backfills = 10
osd_recovery_max_active = 15
merge threshold = 40
filestore split multiple = 8
filestore fd cache size = 1024
osd op threads = 64 # Recovery tuning osd recovery max active = 1 osd max
backfills = 1
osd recovery op priority = 1
throttler perf counter = false
osd enable op tracker = false
filestore_queue_max_ops = 5000
filestore_queue_committing_max_ops = 5000
journal_max_write_entries = 1000
journal_queue_max_ops = 5000
objecter_inflight_ops = 8192

  When I test with 7 servers (14 osds),  the maximum iops of 4k random write I saw is 17k on single volume and 44k on the whole cluster.
I expected the number of 30 osds cluster could approximate 90k. But unfornately,  I found that with 30 osds, it almost provides the performce
as 14 osds, even worse sometime. I checked the iostat output on all the nodes, which have similar numbers. It's well distributed but disk utilization is low.
In the test with 14 osds, I can see higher utilization of disk (80%~90%).  So do you have any tunning suggestion to improve the performace with 30 osds?
Any feedback is appreciated.  

iostat output:
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00    88.50    0.00 5188.00     0.00 93397.00    18.00     0.90    0.17   0.09  47.85
sdc               0.00   443.50    0.00 5561.50     0.00 97324.00    17.50     4.06    0.73   0.09  47.90
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    17.50    0.00   28.00     0.00  3948.00   141.00     0.01    0.29   0.05   0.15
sdb               0.00    69.50    0.00 4932.00     0.00 87067.50    17.65     2.27    0.46   0.09  43.45
sdc               0.00    69.00    0.00 4855.50     0.00 105771.50    21.78     0.95    0.20   0.10  46.40
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00   42.50     0.00  3948.00    92.89     0.01    0.19   0.04   0.15

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    12.00    0.00    8.00     0.00   568.00    71.00     0.00    0.12   0.12   0.10
sdb               0.00    72.50    0.00 5046.50     0.00 113198.50    22.43     1.09    0.22   0.10  51.40
sdc               0.00    72.50    0.00 4912.00     0.00 91204.50    18.57     2.25    0.46   0.09  43.60
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00   18.00     0.00   568.00    31.56     0.00    0.17   0.06   0.10

Regards,
Mark Wu

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Follow-Ups:

Re:  Performance doesn't scale well on a full ssd cluster.
From: Shu, Xinxin
Re:  Performance doesn't scale well on a full ssd	cluster.
From: Gregory Farnum

Prev by Date:
Re:  Error deploying Ceph

Next by Date:
Re:  Performance doesn't scale well on a full ssd	cluster.

Previous by thread:
Error deploying Ceph

Next by thread:
Re:  Performance doesn't scale well on a full ssd	cluster.

Index(es):

Date
Thread

[Index of Archives]

[Information on CEPH]

[Linux Filesystem Development]

[Ceph Development]

[Ceph Large]

[Ceph Dev]

[Linux USB Development]

[Video for Linux]

[Linux Audio Users]

[Yosemite News]

[Linux Kernel]

[Linux SCSI]

[xfs]