Re: [Share]Performance tunning on Ceph FileStore with SSD backend

Sebastien Han <sebastien.han@xxxxxxxxxxxx> · Wed, 9 Apr 2014 16:10:10 +0200

Hey Haomai,

Nice work, by an chance do you have a branch that contains all the changes you’ve made?
So that people can try themselves :)

I look forward to reading more results :)

Thanks!

Cheers.
–––– 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien.han@xxxxxxxxxxxx 
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance 

On 09 Apr 2014, at 14:08, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:

> Hi,
> 
> thanks for sharing !
> (I'm looking to build a full ssd cluster too with 1TB ssd)
> 
>>> With these, we achieve 3x-4x performance improvements on filestore with SSD. 
> 
> Do you have some iops values benchmark, before and after ?
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Haomai Wang" <haomaiwang@xxxxxxxxx> 
> À: ceph-devel@xxxxxxxxxxxxxxx 
> Envoyé: Mercredi 9 Avril 2014 12:05:19 
> Objet: [Share]Performance tunning on Ceph FileStore with SSD backend 
> 
> Hi all, 
> 
> I would like to share some ideas about how to improve performance on 
> ceph with SSD. Not much preciseness. 
> 
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD). 
> ceph version is 0.67.5(Dumping) 
> 
> At first, we find three bottleneck on filestore: 
> 1. fdcache_lock(changed in Firely release) 
> 2. lfn_find in omap_* methods 
> 3. DBObjectMap header 
> 
> According to my understanding and the docs in 
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h), 
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully 
> sure the correctness of this change, but it works well still now. 
> 
> DBObjectMap header patch is on the pull request queue and may be 
> merged in the next feature merge window. 
> 
> With things above done, we get much performance improvement in disk 
> util and benchmark results(3x-4x). 
> 
> Next, we find fdcache size become the main bottleneck. For example, if 
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot 
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With 
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and 
> cache miss is expensive and can't be afford. The implementation of 
> FDCache isn't O(1). So we only can get high performance on fdcache hit 
> range(maybe 100GB with 10240 fdcache size) and more data exceed the 
> size of fdcaceh will be disaster. If you want to cache more fd(102400 
> fdcache size), the implementation of FDCache will bring on extra CPU 
> cost(can't be ignore) for each op. 
> 
> Because of the capacity of SSD(several hundreds GB), we try to 
> increase the size of rbd object(16MB) so less fd cache is needed. As 
> for FDCache implementation, we simply discard SimpleLRU but introduce 
> RandomCache. Now we can set much larger fdcache size(near cache all 
> fd) with little overload. 
> 
> With these, we achieve 3x-4x performance improvements on filestore with SSD. 
> 
> Maybe it exists something I missed or something wrong, hope can 
> correct me. I hope it can help to improve FileStore on SSD and push 
> into master branch. 
> 
> -- 
> 
> Best Regards, 
> 
> Wheat 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail