[Re-added the list.] I assume you added more clients and checked that it didn't scale past that? You might look through the list archives; there are a number of discussions about how and how far you can scale SSD-backed cluster performance. Just scanning through the config options you set, you might want to bump up all the filestore and journal queue values a lot farther. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Thu, Oct 16, 2014 at 9:51 AM, Mark Wu <wudx05@xxxxxxxxx> wrote: > Thanks for the reply. I am not using single client. Writing 5 rbd volumes on > 3 host can reach the peak. The client is fio and also running on osd nodes. > But there're no bottlenecks on cpu or network. I also tried running client > on two non osd servers, but the same result. > > 2014 年 10 月 17 日 上午 12:29于 "Gregory Farnum" <greg@xxxxxxxxxxx>写道: > >> If you're running a single client to drive these tests, that's your >> bottleneck. Try running multiple clients and aggregating their numbers. >> -Greg >> >> On Thursday, October 16, 2014, Mark Wu <wudx05@xxxxxxxxx> wrote: >>> >>> Hi list, >>> >>> During my test, I found ceph doesn't scale as I expected on a 30 osds >>> cluster. >>> The following is the information of my setup: >>> HW configuration: >>> 15 Dell R720 servers, and each server has: >>> Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and >>> hyper-thread enabled. >>> 128GB memory >>> two Intel 3500 SSD disks, connected with MegaRAID SAS 2208 >>> controller, each disk is configured as raid0 separately. >>> bonding with two 10GbE nics, used for both the public network and >>> cluster network. >>> >>> SW configuration: >>> OS CentOS 6.5, Kernel 3.17, Ceph 0.86 >>> XFS as file system for data. >>> each SSD disk has two partitions, one is osd data and the other is osd >>> journal. >>> the pool has 2048 pgs. 2 replicas. >>> 5 monitors running on 5 of the 15 servers. >>> Ceph configuration (in memory debugging options are disabled) >>> >>> [osd] >>> osd data = /var/lib/ceph/osd/$cluster-$id >>> osd journal = /var/lib/ceph/osd/$cluster-$id/journal >>> osd mkfs type = xfs >>> osd mkfs options xfs = -f -i size=2048 >>> osd mount options xfs = rw,noatime,logbsize=256k,delaylog >>> osd journal size = 20480 >>> osd mon heartbeat interval = 30 # Performance tuning filestore >>> osd_max_backfills = 10 >>> osd_recovery_max_active = 15 >>> merge threshold = 40 >>> filestore split multiple = 8 >>> filestore fd cache size = 1024 >>> osd op threads = 64 # Recovery tuning osd recovery max active = 1 osd max >>> backfills = 1 >>> osd recovery op priority = 1 >>> throttler perf counter = false >>> osd enable op tracker = false >>> filestore_queue_max_ops = 5000 >>> filestore_queue_committing_max_ops = 5000 >>> journal_max_write_entries = 1000 >>> journal_queue_max_ops = 5000 >>> objecter_inflight_ops = 8192 >>> >>> >>> When I test with 7 servers (14 osds), the maximum iops of 4k random >>> write I saw is 17k on single volume and 44k on the whole cluster. >>> I expected the number of 30 osds cluster could approximate 90k. But >>> unfornately, I found that with 30 osds, it almost provides the performce >>> as 14 osds, even worse sometime. I checked the iostat output on all the >>> nodes, which have similar numbers. It's well distributed but disk >>> utilization is low. >>> In the test with 14 osds, I can see higher utilization of disk (80%~90%). >>> So do you have any tunning suggestion to improve the performace with 30 >>> osds? >>> Any feedback is appreciated. >>> >>> >>> iostat output: >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s >>> avgrq-sz avgqu-sz await svctm %util >>> sda 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> sdb 0.00 88.50 0.00 5188.00 0.00 93397.00 >>> 18.00 0.90 0.17 0.09 47.85 >>> sdc 0.00 443.50 0.00 5561.50 0.00 97324.00 >>> 17.50 4.06 0.73 0.09 47.90 >>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s >>> avgrq-sz avgqu-sz await svctm %util >>> sda 0.00 17.50 0.00 28.00 0.00 3948.00 >>> 141.00 0.01 0.29 0.05 0.15 >>> sdb 0.00 69.50 0.00 4932.00 0.00 87067.50 >>> 17.65 2.27 0.46 0.09 43.45 >>> sdc 0.00 69.00 0.00 4855.50 0.00 105771.50 >>> 21.78 0.95 0.20 0.10 46.40 >>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> dm-1 0.00 0.00 0.00 42.50 0.00 3948.00 >>> 92.89 0.01 0.19 0.04 0.15 >>> >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s >>> avgrq-sz avgqu-sz await svctm %util >>> sda 0.00 12.00 0.00 8.00 0.00 568.00 >>> 71.00 0.00 0.12 0.12 0.10 >>> sdb 0.00 72.50 0.00 5046.50 0.00 113198.50 >>> 22.43 1.09 0.22 0.10 51.40 >>> sdc 0.00 72.50 0.00 4912.00 0.00 91204.50 >>> 18.57 2.25 0.46 0.09 43.60 >>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> dm-1 0.00 0.00 0.00 18.00 0.00 568.00 >>> 31.56 0.00 0.17 0.06 0.10 >>> >>> >>> >>> Regards, >>> Mark Wu >>> >> >> >> -- >> Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com