Re: LevelDB Backend For Ceph OSD Preview

Sebastien Han <sebastien.han@xxxxxxxxxxxx> · Mon, 25 Nov 2013 10:00:47 +0100

Nice job Haomai!

–––– 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien.han@xxxxxxxxxxxx 
Address : 10, rue de la Victoire - 75009 Paris 
Web : www.enovance.com - Twitter : @enovance 

On 25 Nov 2013, at 02:50, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:

> 
> 
> 
> On Mon, Nov 25, 2013 at 2:17 AM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
> Great Work! This is very exciting!  Did you happen to try RADOS bench at different object sizes and concurrency levels?
> 
> 
> Maybe can try it later. :-)
>  
> Mark
> 
> 
> On 11/24/2013 03:01 AM, Haomai Wang wrote:
> Hi all,
> 
> For Emperor
> blueprint(http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Add_LevelDB_support_to_ceph_cluster_backend_store),
> I'm sorry to delay the progress. Now, I have done the most of the works
> for the blueprint's goal. Because of sage's F
> blueprint(http://wiki.ceph.com/index.php?title=01Planning/02Blueprints/Firefly/osd:_new_key%2F%2Fvalue_backend),
> I need to adjust some codes to match it. The branch is
> here(https://github.com/yuyuyu101/ceph/tree/wip/6173).
> 
> I have tested the LevelDB backend on three nodes(eight OSDs) and compare
> it to FileStore(ext4). I just use intern benchmark tool "rados bench" to
> get the comparison. The default ceph configurations is used and
> replication size is 2. The filesystem is ext4 and no others changed. The
> results is below:
> 
> *Rados Bench*
> 
>         
> 
> *Bandwidth(MB/sec)*
> 
>         
> 
> *Average Latency*
> 
>         
> 
> *Max Latency*
> 
>         
> 
> *Min Latency*
> 
>         
> 
> *Stddev Latency*
> 
>         
> 
> *Stddev Bandwidth(MB/sec)*
> 
>         
> 
> *Max Bandwidth(MB/sec)*
> 
>         
> 
> *Min Bandwidth(MB/sec)*
> 
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
> *Write 30*
> 
> 
>         
> 
> 24.590
> 
>         
> 
> 23.495
> 
>         
> 
> 4.87257
> 
>         
> 
> 5.07716
> 
>         
> 
> 14.752
> 
>         
> 
> 13.0885
> 
>         
> 
> 0.580851
> 
>         
> 
> 0.605118
> 
>         
> 
> 2.97708
> 
>         
> 
> 3.30538
> 
>         
> 
> 9.91938
> 
>         
> 
> 10.5986
> 
>         
> 
> 44
> 
>         
> 
> 76
> 
>         
> 
> 0
> 
>         
> 
> 0
> 
> *Write 20*
> 
> 
>         
> 
> 23.515
> 
>         
> 
> 23.064
> 
>         
> 
> 3.39745
> 
>         
> 
> 3.45711
> 
>         
> 
> 11.6089
> 
>         
> 
> 11.5996
> 
>         
> 
> 0.169507
> 
>         
> 
> 0.138595
> 
>         
> 
> 2.58285
> 
>         
> 
> 2.75962
> 
>         
> 
> 9.14467
> 
>         
> 
> 8.54156
> 
>         
> 
> 44
> 
>         
> 
> 40
> 
>         
> 
> 0
> 
>         
> 
> 0
> 
> *Write 10*
> 
> 
>         
> 
> 22.927
> 
>         
> 
> 21.980
> 
>         
> 
> 1.73815
> 
>         
> 
> 1.8198
> 
>         
> 
> 5.53792
> 
>         
> 
> 6.46675
> 
>         
> 
> 0.171028
> 
>         
> 
> 0.143392
> 
>         
> 
> 1.05982
> 
>         
> 
> 1.20303
> 
>         
> 
> 9.18403
> 
>         
> 
> 8.74401
> 
>         
> 
> 44
> 
>         
> 
> 40
> 
>         
> 
> 0
> 
>         
> 
> 0
> 
> *Write 5*
> 
> 
>         
> 
> 19.680
> 
>         
> 
> 20.017
> 
>         
> 
> 1.01492
> 
>         
> 
> 0.997019
> 
>         
> 
> 3.10783
> 
>         
> 
> 3.05008
> 
>         
> 
> 0.143758
> 
>         
> 
> 0.138161
> 
>         
> 
> 0.561548
> 
>         
> 
> 0.571459
> 
>         
> 
> 5.92575
> 
>         
> 
> 6.844
> 
>         
> 
> 36
> 
>         
> 
> 32
> 
>         
> 
> 0
> 
>         
> 
> 0
> 
> *Read 30*
> 
> 
>         
> 
> 65.852
> 
>         
> 
> 60.688
> 
>         
> 
> 1.80069
> 
>         
> 
> 1.96009
> 
>         
> 
> 9.30039
> 
>         
> 
> 10.1146
> 
>         
> 
> 0.115153
> 
>         
> 
> 0.061657
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
> *Read 20*
> 
> 
>         
> 
> 59.372
> 
>         
> 
> 60.738
> 
>         
> 
> 1.30479
> 
>         
> 
> 1.28383
> 
>         
> 
> 6.28435
> 
>         
> 
> 8.21304
> 
>         
> 
> 0.016843
> 
>         
> 
> 0.012073
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
> *Read 10*
> 
> 
>         
> 
> 65.502
> 
>         
> 
> 55.814
> 
>         
> 
> 0.608805
> 
>         
> 
> 0.7087
> 
>         
> 
> 3.3917
> 
>         
> 
> 4.72626
> 
>         
> 
> 0.016267
> 
>         
> 
> 0.011998
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
> *Read 5*
> 
> 
>         
> 
> 64.176
> 
>         
> 
> 54.928
> 
>         
> 
> 0.307111
> 
>         
> 
> 0.364077
> 
>         
> 
> 1.76391
> 
>         
> 
> 1.90182
> 
>         
> 
> 0.017174
> 
>         
> 
> 0.011999
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
>         
> 
> 
> 
> Charts can be view here(http://img42.com/ziwjP+) and
> (http://img42.com/LKhoo+)
> 
> 
>  From above, I'm feeling relieved that the LevelDB backend isn't
> useless. Most of metrics are better and if increasing cache size for
> LevelDB the results may be more attractive.
> Even more, LevelDB backend is used by "KeyValueStore" and much of
> optimizations can be done to improve performance such as increase
> parallel threads or optimize io path.
> 
> Next, I use "rbd bench-write" to test. The result is pity:
> 
> *RBD Bench-Write*
> 
>         
> 
> *OPS/sec*
> 
>         
> 
> *Bytes/sec*
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
>         
> 
> *KVStore*
> 
>         
> 
> *FileStore*
> 
> *Seq 4096 5*
> 
>         
> 
> 27.42
> 
>         
> 
> 716.55
> 
>         
> 
> 111861.51
> 
>         
> 
> 2492149.21
> 
> *Rand 4096 5*
> 
> 
>         
> 
> 28.27
> 
>         
> 
> 504
> 
>         
> 
> 112331.42
> 
>         
> 
> 1683151.29
> 
> 
> Just because kv backend doesn't support read/write operation with
> offset/length argument, each read/write operation will call a additional
> read LevelDB api to do. Much of time is consumed by reading entire large
> object in rbd situation. There exists some ways to change such as split
> large object to multi small objects or save metadata to avoid read
> arduous operation.
> 
> As sage mentioned in <osd: new key/value
> backend>(http://wiki.ceph.com/index.php?title=01Planning/02Blueprints/Firefly/osd:_new_key%2F%2Fvalue_backend),
> more kv backends can be added now and I look forward to more people
> interested it. I think radosgw situation can fit in kv store in short ti
> 
> --
> 
> Best Regards,
> 
> Wheat
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Best Regards,
> 
> Wheat
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com