RE: 回复: Re: 回复: Re: 回复: Re: NewStore performance analysis

"Chen, Xiaoxi" <xiaoxi.chen@xxxxxxxxx> · Wed, 22 Apr 2015 03:34:42 +0000

Hi Sage and Mark,

           Chendi from our team had done the test based on v0.91. The setup is 4 nodes, totally 40HDDs with SSDs as journal, replica=2

           Mount a partition from journal SSD to /current/omap benefit 4K random write IOPS(peak)  from 1524 to 2694, that's 76% while other IO patterns keep the same.

Some details are here.
           If this can reproduce in other setup, I suspect it worth us to investigate some time to do the detection.

		Runid	OP_SIZE	OP_TYPE	QD	Engine	server_num	client_num	rbd_num	RBD_FIO_IOPS	RBD_FIO_BW	RBD_FIO_Latency	osd_read_iops	osd_write_iops	osd_read_bw	
Prev		305	4k	randwrite	qd8	vdb	4	2	40	1524	6170.1	            209.3851	            7.862196	          7677.648	0.446566	54.916435
Omap2ssd	320	4k	randwrite	qd8	vdb	4	2	40	2694	10864.23	119.4587	322.4334	10930	1.409266	71.33833

															Xiaoxi
-----Original Message-----
From: Mark Nelson [mailto:mnelson@xxxxxxxxxx] 
Sent: Wednesday, April 22, 2015 7:59 AM
To: Sage Weil; Chen, Xiaoxi
Cc: Haomai Wang; Somnath Roy; Duan, Jiangang; Zhang, Jian; ceph-devel
Subject: Re: 回复: Re: 回复: Re: 回复: Re: NewStore performance analysis

On 04/21/2015 06:57 PM, Sage Weil wrote:
> On Tue, 21 Apr 2015, Chen, Xiaoxi wrote:
>> ---- Sage Weil?? ----
>>
>>> On Tue, 21 Apr 2015, Chen, Xiaoxi wrote:
>>>> Haomai is right in theory, but I am not sure whether all
>>>> user(mon,filestore,kvstore) of submit_transaction API clearly 
>>>> holding the expectation that their data is not persistent and may 
>>>> lost in failure.  So in rocksdb now the sync is default to true 
>>>> even in submit_transaction(and this option make the two api exactly the same).
>>>> Maybe we need to rename the api to
>>>> submit_transaction_persistent/nonpersistent to better discribe the 
>>>> behavior?
>>>
>>> Let's audit them, then.. I think they are right, but we may as well 
>>> confirm!
>>>
>>> Again, FileStore is the odd one out here because it is relying on 
>>> the
>>> syncfs(2) at commit time for everything.
>>>
>>
>> Yes, so maybe we dont need to expose the option to user, we can 
>> decide whether to.sync in code logic.
>
> Yeah, I think it'll reduce confusion too.  I suggest we do a pull 
> request against master that does this... let me know if you want to do 
> it, otherwise I will!
>
>> I remember some folks in out team tried to move KVDB to a partition 
>> on SSD while leave other filestore data on HDD, in my memory it 
>> benifit performance.  This deployment is problematic with 
>> kv_sync=false.  gWill check the data first and then we can evaluate 
>> whethe we want to support this kind of deployment.
>
> We could detect this by doing a stat(2) on the current/omap/ vs 
> current/ dirs and checking if it's a different file system.  If so, we 
> can do the
> syncfs(2) on both dirs.  The btrfs case would probably not be 
> practical, but we can error out in that case.  But yeah not sure how 
> important it would be to support this since filestore doesn't use 
> leveldb that heavily... and I'd prefer to limit our investment of time 
> there if we can instead make newstore (or something else) better.

FWIW, the last time I tried putting leveldb on SSD didn't really help at all.  It's been a while so maybe that's changed, but newstore definitely seems like the way forward to me.

Mark
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f