forgot to cc the list
---------- 转发的邮件 ----------
发件人:"Mark Wu" <wudx05@xxxxxxxxx>
日期:2014 年 10 月 17 日 上午 12:51
主题:Re: Performance doesn't scale well on a full ssd cluster.
收件人:"Gregory Farnum" <greg@xxxxxxxxxxx>
抄送:
发件人:"Mark Wu" <wudx05@xxxxxxxxx>
日期:2014 年 10 月 17 日 上午 12:51
主题:Re: Performance doesn't scale well on a full ssd cluster.
收件人:"Gregory Farnum" <greg@xxxxxxxxxxx>
抄送:
Thanks for the reply. I am not using single client. Writing 5 rbd volumes on 3 host can reach the peak. The client is fio and also running on osd nodes. But there're no bottlenecks on cpu or network. I also tried running client on two non osd servers, but the same result.
2014 年 10 月 17 日 上午 12:29于 "Gregory Farnum" <greg@xxxxxxxxxxx>写道:
If you're running a single client to drive these tests, that's your bottleneck. Try running multiple clients and aggregating their numbers.-Greg
On Thursday, October 16, 2014, Mark Wu <wudx05@xxxxxxxxx> wrote:Hi list,During my test, I found ceph doesn't scale as I expected on a 30 osds cluster.The following is the information of my setup:HW configuration:15 Dell R720 servers, and each server has:Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and hyper-thread enabled.128GB memorytwo Intel 3500 SSD disks, connected with MegaRAID SAS 2208 controller, each disk is configured as raid0 separately.bonding with two 10GbE nics, used for both the public network and cluster network.SW configuration:OS CentOS 6.5, Kernel 3.17, Ceph 0.86XFS as file system for data.each SSD disk has two partitions, one is osd data and the other is osd journal.the pool has 2048 pgs. 2 replicas.5 monitors running on 5 of the 15 servers.Ceph configuration (in memory debugging options are disabled)[osd]osd data = "">osd journal = /var/lib/ceph/osd/$cluster-$id/journalosd mkfs type = xfsosd mkfs options xfs = -f -i size=2048osd mount options xfs = rw,noatime,logbsize=256k,delaylogosd journal size = 20480osd mon heartbeat interval = 30 # Performance tuning filestoreosd_max_backfills = 10osd_recovery_max_active = 15merge threshold = 40filestore split multiple = 8filestore fd cache size = 1024osd op threads = 64 # Recovery tuning osd recovery max active = 1 osd maxbackfills = 1osd recovery op priority = 1throttler perf counter = falseosd enable op tracker = falsefilestore_queue_max_ops = 5000filestore_queue_committing_max_ops = 5000journal_max_write_entries = 1000journal_queue_max_ops = 5000objecter_inflight_ops = 8192When I test with 7 servers (14 osds), the maximum iops of 4k random write I saw is 17k on single volume and 44k on the whole cluster.I expected the number of 30 osds cluster could approximate 90k. But unfornately, I found that with 30 osds, it almost provides the performceas 14 osds, even worse sometime. I checked the iostat output on all the nodes, which have similar numbers. It's well distributed but disk utilization is low.In the test with 14 osds, I can see higher utilization of disk (80%~90%). So do you have any tunning suggestion to improve the performace with 30 osds?Any feedback is appreciated.iostat output:Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utilsda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00sdb 0.00 88.50 0.00 5188.00 0.00 93397.00 18.00 0.90 0.17 0.09 47.85sdc 0.00 443.50 0.00 5561.50 0.00 97324.00 17.50 4.06 0.73 0.09 47.90dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utilsda 0.00 17.50 0.00 28.00 0.00 3948.00 141.00 0.01 0.29 0.05 0.15sdb 0.00 69.50 0.00 4932.00 0.00 87067.50 17.65 2.27 0.46 0.09 43.45sdc 0.00 69.00 0.00 4855.50 0.00 105771.50 21.78 0.95 0.20 0.10 46.40dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dm-1 0.00 0.00 0.00 42.50 0.00 3948.00 92.89 0.01 0.19 0.04 0.15Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utilsda 0.00 12.00 0.00 8.00 0.00 568.00 71.00 0.00 0.12 0.12 0.10sdb 0.00 72.50 0.00 5046.50 0.00 113198.50 22.43 1.09 0.22 0.10 51.40sdc 0.00 72.50 0.00 4912.00 0.00 91204.50 18.57 2.25 0.46 0.09 43.60dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dm-1 0.00 0.00 0.00 18.00 0.00 568.00 31.56 0.00 0.17 0.06 0.10Regards,Mark Wu
--
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
- References:
- Performance doesn't scale well on a full ssd cluster.
- From: Mark Wu
- Re: Performance doesn't scale well on a full ssd cluster.
- From: Gregory Farnum
- Performance doesn't scale well on a full ssd cluster.
- Prev by Date: Re: Performance doesn't scale well on a full ssd cluster.
- Next by Date: Re: CRUSH depends on host + OSD?
- Previous by thread: Re: Performance doesn't scale well on a full ssd cluster.
- Next by thread: Re: Performance doesn't scale well on a full ssd cluster.
- Index(es):