Hello (Greg in particular), On Thu, 16 Oct 2014 10:06:58 -0700 Gregory Farnum wrote: > [Re-added the list.] > > I assume you added more clients and checked that it didn't scale past > that? You might look through the list archives; there are a number of > discussions about how and how far you can scale SSD-backed cluster > performance. Indeed there are and the first one (while not SSD backed, but close enough) I remember is by yours truly: https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg09537.html In which you participated as well. > Just scanning through the config options you set, you might want to > bump up all the filestore and journal queue values a lot farther. > I did that back then, with little to no effect. Which brings me to another point. Only a fraction of these parameters (visible when doing a live config dump) are documented and while one can guess what they probably do/mean and what their values denote this is not how it should be. Especially when you expect people to tune these parameters. Christian > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Thu, Oct 16, 2014 at 9:51 AM, Mark Wu <wudx05@xxxxxxxxx> wrote: > > Thanks for the reply. I am not using single client. Writing 5 rbd > > volumes on 3 host can reach the peak. The client is fio and also > > running on osd nodes. But there're no bottlenecks on cpu or network. I > > also tried running client on two non osd servers, but the same result. > > > > 2014 年 10 月 17 日 上午 12:29于 "Gregory Farnum" > > <greg@xxxxxxxxxxx>写道: > > > >> If you're running a single client to drive these tests, that's your > >> bottleneck. Try running multiple clients and aggregating their > >> numbers. -Greg > >> > >> On Thursday, October 16, 2014, Mark Wu <wudx05@xxxxxxxxx> wrote: > >>> > >>> Hi list, > >>> > >>> During my test, I found ceph doesn't scale as I expected on a 30 osds > >>> cluster. > >>> The following is the information of my setup: > >>> HW configuration: > >>> 15 Dell R720 servers, and each server has: > >>> Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and > >>> hyper-thread enabled. > >>> 128GB memory > >>> two Intel 3500 SSD disks, connected with MegaRAID SAS 2208 > >>> controller, each disk is configured as raid0 separately. > >>> bonding with two 10GbE nics, used for both the public network > >>> and cluster network. > >>> > >>> SW configuration: > >>> OS CentOS 6.5, Kernel 3.17, Ceph 0.86 > >>> XFS as file system for data. > >>> each SSD disk has two partitions, one is osd data and the other > >>> is osd journal. > >>> the pool has 2048 pgs. 2 replicas. > >>> 5 monitors running on 5 of the 15 servers. > >>> Ceph configuration (in memory debugging options are disabled) > >>> > >>> [osd] > >>> osd data = /var/lib/ceph/osd/$cluster-$id > >>> osd journal = /var/lib/ceph/osd/$cluster-$id/journal > >>> osd mkfs type = xfs > >>> osd mkfs options xfs = -f -i size=2048 > >>> osd mount options xfs = rw,noatime,logbsize=256k,delaylog > >>> osd journal size = 20480 > >>> osd mon heartbeat interval = 30 # Performance tuning filestore > >>> osd_max_backfills = 10 > >>> osd_recovery_max_active = 15 > >>> merge threshold = 40 > >>> filestore split multiple = 8 > >>> filestore fd cache size = 1024 > >>> osd op threads = 64 # Recovery tuning osd recovery max active = 1 > >>> osd max backfills = 1 > >>> osd recovery op priority = 1 > >>> throttler perf counter = false > >>> osd enable op tracker = false > >>> filestore_queue_max_ops = 5000 > >>> filestore_queue_committing_max_ops = 5000 > >>> journal_max_write_entries = 1000 > >>> journal_queue_max_ops = 5000 > >>> objecter_inflight_ops = 8192 > >>> > >>> > >>> When I test with 7 servers (14 osds), the maximum iops of 4k > >>> random write I saw is 17k on single volume and 44k on the whole > >>> cluster. I expected the number of 30 osds cluster could approximate > >>> 90k. But unfornately, I found that with 30 osds, it almost provides > >>> the performce as 14 osds, even worse sometime. I checked the iostat > >>> output on all the nodes, which have similar numbers. It's well > >>> distributed but disk utilization is low. > >>> In the test with 14 osds, I can see higher utilization of disk > >>> (80%~90%). So do you have any tunning suggestion to improve the > >>> performace with 30 osds? > >>> Any feedback is appreciated. > >>> > >>> > >>> iostat output: > >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s > >>> avgrq-sz avgqu-sz await svctm %util > >>> sda 0.00 0.00 0.00 0.00 0.00 0.00 > >>> 0.00 0.00 0.00 0.00 0.00 > >>> sdb 0.00 88.50 0.00 5188.00 0.00 93397.00 > >>> 18.00 0.90 0.17 0.09 47.85 > >>> sdc 0.00 443.50 0.00 5561.50 0.00 97324.00 > >>> 17.50 4.06 0.73 0.09 47.90 > >>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 > >>> 0.00 0.00 0.00 0.00 0.00 > >>> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 > >>> 0.00 0.00 0.00 0.00 0.00 > >>> > >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s > >>> avgrq-sz avgqu-sz await svctm %util > >>> sda 0.00 17.50 0.00 28.00 0.00 3948.00 > >>> 141.00 0.01 0.29 0.05 0.15 > >>> sdb 0.00 69.50 0.00 4932.00 0.00 87067.50 > >>> 17.65 2.27 0.46 0.09 43.45 > >>> sdc 0.00 69.00 0.00 4855.50 0.00 105771.50 > >>> 21.78 0.95 0.20 0.10 46.40 > >>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 > >>> 0.00 0.00 0.00 0.00 0.00 > >>> dm-1 0.00 0.00 0.00 42.50 0.00 3948.00 > >>> 92.89 0.01 0.19 0.04 0.15 > >>> > >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s > >>> avgrq-sz avgqu-sz await svctm %util > >>> sda 0.00 12.00 0.00 8.00 0.00 568.00 > >>> 71.00 0.00 0.12 0.12 0.10 > >>> sdb 0.00 72.50 0.00 5046.50 0.00 113198.50 > >>> 22.43 1.09 0.22 0.10 51.40 > >>> sdc 0.00 72.50 0.00 4912.00 0.00 91204.50 > >>> 18.57 2.25 0.46 0.09 43.60 > >>> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 > >>> 0.00 0.00 0.00 0.00 0.00 > >>> dm-1 0.00 0.00 0.00 18.00 0.00 568.00 > >>> 31.56 0.00 0.17 0.06 0.10 > >>> > >>> > >>> > >>> Regards, > >>> Mark Wu > >>> > >> > >> > >> -- > >> Software Engineer #42 @ http://inktank.com | http://ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com