RE: Specify omap path for filestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Ning

Thanks for the advice, we did done thing you suggested in our performance tuning work, actually tuning up the usage of memory is the first thing we tried. 

Firstly, I should guess the omap to ssd benefit shows when we use quite intensive workload, using 140 vm doing randwrite, 8 qd each, so we almost drive each HDD to utility 95+%.

We hoped and tested on tune up the inode memory size and fd cache size, since I believe if inode can be always hit in the memory which definitely benefit more than using omap. Sadly our server only has 32G memory total. Even we set xattr size as 65535 as original configured and also fd cache size as 10240 as I remembered, still gain a little to the performance but may lead to OOM of OSD, so that is why we came up the solution of moving omap out to a SSD device.

Another reason to move omap out is because it helps on performance analysis, since omap uses keyvaluestore, and each rbd request causes one or more 4k inode operation, which lead a frontend and backend throughput ratio as 1: 5.8, which is not that easy to explain the 5.8. 

Also we can get more randwrite iops if there is no seqwrite to one HDD device, when HDD handles randwrite iops and also some omap(leveldb) write, we can only get 175 iops disk write per HDD when util is nearly full. 
when HDD only handles randwrite without any omap write, we can get 325 iops disk write per HDD when HDD util is nearly full.

System data please refer to below url
http://xuechendi.github.io/data/

omap on HDD is before mapping to other device
omap on SSD is after

Best regards,
Chendi


-----Original Message-----
From: Ning Yao [mailto:zay11022@xxxxxxxxx] 
Sent: Wednesday, November 4, 2015 3:09 PM
To: Xue, Chendi <chendi.xue@xxxxxxxxx>
Cc: Samuel Just <sjust@xxxxxxxxxx>; ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: Specify omap path for filestore

Hi, Chendi,
I don't think it will be a big improvement compared with normal way to using FileStore (enable filestore_max_inline_xattr_xfs and tune filestore_fd_cache_size,osd_pg_object_context_cache_count,
filestore_omap_header_cache_size properly to achieve a high hit rate).Do you enable  filestore_max_inline_xattr in the first test? If not, it may be reasonable. In my previous test, I remember just about 20%~30% improvement.
And can you also provide cpu cost per Op on osd node?
Regards
Ning Yao


2015-10-30 10:04 GMT+08:00 Xue, Chendi <chendi.xue@xxxxxxxxx>:
> Hi, Sam
>
> Last week I introduced about how we saw the benefit of moving omap to a separate device.
>
> And here is the pull request:
> https://github.com/ceph/ceph/pull/6421
>
> I had tested redeploy and restart ceph cluster at my setup, the codes works fine.
> one problem is do you think I should *DELETE* all the files under the omap_path firstly? Because I notice if old pg data leaves there, osd daemon may run into chaos. But I am not sure if it should leave to users to DELETE.
>
> Any thoughts?
>
> Also I paste some data I talked , which is about the rbd and osd write iops ratio when doing randwrite to a rbd device.
>
> ======Here is some data=====
> We uses 4 clients , 35 vm each to test on rbd randwrite.
> 4 osd physical nodes, each has 10 HDD as osd and 2 ssd as journal
> 2 replica
> filestore_max_inline_xattr_xfs=0
> filestore_max_inline_xattr_size_xfs=0
>
> Before moving omap to separate ssd, we saw a frontend and backend iops 
> ratio of 1:5.8, rbd side total iops 1206, hdd total iops 7034 Like we talked, 5.8 consists of 2 replica write, inode and omap writes
> runid         op_size    op_type             QD             engine               serverNum       clientNum         rbdNum   runtime             fio_iops         fio_bw               fio_latency                 osd_iops           osd_bw             osd_latency
> 332            4k              randwrite         qd8            qemurbd           4                          4                          140            400 sec              1206.000         4.987 MB/s      884.617 msec           7034.975          47.407 MB/s    242.620 msec
>
> And after moving omap to a separate ssd, we saw a frontend vs. backend ratio drops to 1:2.6, rbd side total iops 5006, hdd total iops 13089
> runid         op_size    op_type             QD             engine               serverNum       clientNum         rbdNum   runtime             fio_iops         fio_bw               fio_latency                 osd_iops           osd_bw             osd_latency
> 326            4k              randwrite         qd8            qemurbd           4                          4                          140            400 sec              5006.000         19.822 MB/s    222.296 msec           13089.020        82.897 MB/s    482.203 msec
>
>
> Best regards,
> Chendi
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux