Re: [Share]Performance tunning on Ceph FileStore with SSD backend

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Wed, 09 Apr 2014 07:07:53 -0500

On 04/09/2014 05:05 AM, Haomai Wang wrote:
Hi all,

Hi Haomai!

I would like to share some ideas about how to improve performance on
ceph with SSD. Not much preciseness.

Aha, that's ok, but I'm going to pester you with lots of questions below. ;)

Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
ceph version is 0.67.5(Dumping)

At first, we find three bottleneck on filestore:
1. fdcache_lock(changed in Firely release)
2. lfn_find in omap_* methods
3. DBObjectMap header

According to my understanding and the docs in
ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
sure the correctness of this change, but it works well still now.

Yes, but I think it's interesting even if it's not safe!  Did you happen 
to test these things in isolation to see how much of a bottleneck each is?

DBObjectMap header patch is on the pull request queue and may be
merged in the next feature merge window.

With things above done, we get much performance improvement in disk
util and benchmark results(3x-4x).

That's a pretty dramatic result!  What kind of tests did you perform 
where you observed the 3-4x difference?  Did you measure latency and 
iops/throughput?

Next, we find fdcache size become the main bottleneck. For example, if
hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
cache miss is expensive and can't be afford. The implementation of
FDCache isn't O(1). So we only can get high performance on fdcache hit
range(maybe 100GB with 10240 fdcache size) and more data exceed the
size of fdcaceh will be disaster. If you want to cache more fd(102400
fdcache size), the implementation of FDCache will bring on extra CPU
cost(can't be ignore) for each op.

Because of the capacity of SSD(several hundreds GB), we try to
increase the size of rbd object(16MB) so less fd cache is needed. As
for FDCache implementation, we simply discard SimpleLRU but introduce
RandomCache. Now we can set much larger fdcache size(near cache all
fd) with little overload.

With these, we achieve 3x-4x performance improvements on filestore with SSD.

I'm curious how much of an effect changing the RBD object size had 
before and after you applied the new FDCache implementation?

Maybe it exists something I missed or something wrong, hope can
correct me. I hope it can help to improve FileStore on SSD and push
into master branch.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html