Hi, Ning Thanks for the advice, we did done thing you suggested in our performance tuning work, actually tuning up the usage of memory is the first thing we tried. Firstly, I should guess the omap to ssd benefit shows when we use quite intensive workload, using 140 vm doing randwrite, 8 qd each, so we almost drive each HDD to utility 95+%. We hoped and tested on tune up the inode memory size and fd cache size, since I believe if inode can be always hit in the memory which definitely benefit more than using omap. Sadly our server only has 32G memory total. Even we set xattr size as 65535 as original configured and also fd cache size as 10240 as I remembered, still gain a little to the performance but may lead to OOM of OSD, so that is why we came up the solution of moving omap out to a SSD device. Another reason to move omap out is because it helps on performance analysis, since omap uses keyvaluestore, and each rbd request causes one or more 4k inode operation, which lead a frontend and backend throughput ratio as 1: 5.8, which is not that easy to explain the 5.8. Also we can get more randwrite iops if there is no seqwrite to one HDD device, when HDD handles randwrite iops and also some omap(leveldb) write, we can only get 175 iops disk write per HDD when util is nearly full. when HDD only handles randwrite without any omap write, we can get 325 iops disk write per HDD when HDD util is nearly full. System data please refer to below url http://xuechendi.github.io/data/ omap on HDD is before mapping to other device omap on SSD is after Best regards, Chendi -----Original Message----- From: Ning Yao [mailto:zay11022@xxxxxxxxx] Sent: Wednesday, November 4, 2015 3:09 PM To: Xue, Chendi <chendi.xue@xxxxxxxxx> Cc: Samuel Just <sjust@xxxxxxxxxx>; ceph-devel@xxxxxxxxxxxxxxx Subject: Re: Specify omap path for filestore Hi, Chendi, I don't think it will be a big improvement compared with normal way to using FileStore (enable filestore_max_inline_xattr_xfs and tune filestore_fd_cache_size,osd_pg_object_context_cache_count, filestore_omap_header_cache_size properly to achieve a high hit rate).Do you enable filestore_max_inline_xattr in the first test? If not, it may be reasonable. In my previous test, I remember just about 20%~30% improvement. And can you also provide cpu cost per Op on osd node? Regards Ning Yao 2015-10-30 10:04 GMT+08:00 Xue, Chendi <chendi.xue@xxxxxxxxx>: > Hi, Sam > > Last week I introduced about how we saw the benefit of moving omap to a separate device. > > And here is the pull request: > https://github.com/ceph/ceph/pull/6421 > > I had tested redeploy and restart ceph cluster at my setup, the codes works fine. > one problem is do you think I should *DELETE* all the files under the omap_path firstly? Because I notice if old pg data leaves there, osd daemon may run into chaos. But I am not sure if it should leave to users to DELETE. > > Any thoughts? > > Also I paste some data I talked , which is about the rbd and osd write iops ratio when doing randwrite to a rbd device. > > ======Here is some data===== > We uses 4 clients , 35 vm each to test on rbd randwrite. > 4 osd physical nodes, each has 10 HDD as osd and 2 ssd as journal > 2 replica > filestore_max_inline_xattr_xfs=0 > filestore_max_inline_xattr_size_xfs=0 > > Before moving omap to separate ssd, we saw a frontend and backend iops > ratio of 1:5.8, rbd side total iops 1206, hdd total iops 7034 Like we talked, 5.8 consists of 2 replica write, inode and omap writes > runid op_size op_type QD engine serverNum clientNum rbdNum runtime fio_iops fio_bw fio_latency osd_iops osd_bw osd_latency > 332 4k randwrite qd8 qemurbd 4 4 140 400 sec 1206.000 4.987 MB/s 884.617 msec 7034.975 47.407 MB/s 242.620 msec > > And after moving omap to a separate ssd, we saw a frontend vs. backend ratio drops to 1:2.6, rbd side total iops 5006, hdd total iops 13089 > runid op_size op_type QD engine serverNum clientNum rbdNum runtime fio_iops fio_bw fio_latency osd_iops osd_bw osd_latency > 326 4k randwrite qd8 qemurbd 4 4 140 400 sec 5006.000 19.822 MB/s 222.296 msec 13089.020 82.897 MB/s 482.203 msec > > > Best regards, > Chendi > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f