To me this numbers look within error bars identical and isn't that expected? The main benefit of Rocksdb vs. Leveldb you can see when you create large tables going to 1 billion entries. How many keys did you create per OSD in your Rados benchmarks? Cheers Andreas. ________________________________________ From: ceph-devel-owner@xxxxxxxxxxxxxxx [ceph-devel-owner@xxxxxxxxxxxxxxx] on behalf of Haomai Wang [haomaiwang@xxxxxxxxx] Sent: 05 March 2014 09:31 To: Alexandre DERUMIER Cc: Xinxin Shu; ceph-devel@xxxxxxxxxxxxxxx Subject: Re: [RFC] add rocksdb support I think the reason why the little difference between leveldb and rocksdb in FileStore is that the main latency cause isn't KeyValueDB backend. So we may not get enough benefit from rocksdb instead of leveldb by FileStore. On Wed, Mar 5, 2014 at 4:23 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote: >>>Hi Alexandre, below is random io test results, almost the same iops. > > Thanks Xinxin, seem not too bad indeed. and latencies seem to be a little lower than leveldb > > (this was with 7,2k disks ? replication 2x or 3x ?) > > > > ----- Mail original ----- > > De: "Xinxin Shu" <xinxin.shu@xxxxxxxxx> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> > Cc: ceph-devel@xxxxxxxxxxxxxxx > Envoyé: Mardi 4 Mars 2014 09:41:05 > Objet: RE: [RFC] add rocksdb support > > Hi Alexandre, below is random io test results, almost the same iops. > > Rocksdb results > > ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64 > fio-2.1.4 > Starting 1 thread > rbd engine: RBD version: 0.1.8 > Jobs: 1 (f=1): [w] [100.0% done] [0KB/23094KB/0KB /s] [0/5773/0 iops] [eta 00m:00s] > ebs_test: (groupid=0, jobs=1): err= 0: pid=47154: Tue Mar 4 13:48:22 2014 > write: io=3356.2MB, bw=17183KB/s, iops=4295, runt=200004msec > slat (usec): min=19, max=8855, avg=134.33, stdev=259.00 > clat (usec): min=73, max=4397.6K, avg=12756.12, stdev=79341.35 > lat (msec): min=1, max=4397, avg=12.89, stdev=79.34 > clat percentiles (usec): > | 1.00th=[ 1432], 5.00th=[ 1752], 10.00th=[ 2128], 20.00th=[ 3408], > | 30.00th=[ 4768], 40.00th=[ 5856], 50.00th=[ 6880], 60.00th=[ 7904], > | 70.00th=[ 8896], 80.00th=[10048], 90.00th=[11968], 95.00th=[14016], > | 99.00th=[27520], 99.50th=[505856], 99.90th=[1204224], 99.95th=[1433600], > | 99.99th=[2834432] > bw (KB /s): min= 403, max=24392, per=100.00%, avg=17358.47, stdev=7446.69 > lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% > lat (msec) : 2=8.36%, 4=15.77%, 10=55.27%, 20=19.17%, 50=0.51% > lat (msec) : 100=0.09%, 250=0.16%, 500=0.14%, 750=0.19%, 1000=0.15% > lat (msec) : 2000=0.16%, >=2000=0.01% > cpu : usr=18.04%, sys=4.15%, ctx=1875119, majf=0, minf=838 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=1.1%, 16=10.9%, 32=65.9%, >=64=22.1% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=97.6%, 8=0.4%, 16=0.4%, 32=0.6%, 64=0.9%, >=64=0.0% > issued : total=r=0/w=859165/d=0, short=r=0/w=0/d=0 > > Run status group 0 (all jobs): > WRITE: io=3356.2MB, aggrb=17182KB/s, minb=17182KB/s, maxb=17182KB/s, mint=200004msec, maxt=200004msec > > Disk stats (read/write): > sda: ios=0/2191, merge=0/2904, ticks=0/936, in_queue=936, util=0.29% > > leveldb results: > > ebs_test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64 > fio-2.1.4 > Starting 1 thread > rbd engine: RBD version: 0.1.8 > Jobs: 1 (f=1): [w] [100.0% done] [0KB/9428KB/0KB /s] [0/2357/0 iops] [eta 00m:00s] > ebs_test: (groupid=0, jobs=1): err= 0: pid=112425: Tue Mar 4 14:54:00 2014 > write: io=3404.9MB, bw=17431KB/s, iops=4357, runt=200016msec > slat (usec): min=20, max=7698, avg=114.01, stdev=201.06 > clat (usec): min=220, max=3278.3K, avg=13340.59, stdev=76874.35 > lat (msec): min=1, max=3278, avg=13.45, stdev=76.87 > clat percentiles (usec): > | 1.00th=[ 1400], 5.00th=[ 1608], 10.00th=[ 1784], 20.00th=[ 2192], > | 30.00th=[ 2832], 40.00th=[ 3824], 50.00th=[ 5024], 60.00th=[ 6240], > | 70.00th=[ 7456], 80.00th=[ 8768], 90.00th=[10816], 95.00th=[13120], > | 99.00th=[284672], 99.50th=[610304], 99.90th=[1089536], 99.95th=[1286144], > | 99.99th=[1630208] > bw (KB /s): min= 24, max=25548, per=100.00%, avg=17606.69, stdev=6779.23 > lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% > lat (msec) : 2=15.63%, 4=25.94%, 10=45.35%, 20=10.98%, 50=0.44% > lat (msec) : 100=0.17%, 250=0.40%, 500=0.42%, 750=0.34%, 1000=0.19% > lat (msec) : 2000=0.12%, >=2000=0.01% > cpu : usr=18.25%, sys=4.14%, ctx=1887389, majf=0, minf=742 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.5%, 16=6.0%, 32=55.9%, >=64=37.5% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=97.8%, 8=0.7%, 16=0.5%, 32=0.5%, 64=0.5%, >=64=0.0% > issued : total=r=0/w=871635/d=0, short=r=0/w=0/d=0 > > Run status group 0 (all jobs): > WRITE: io=3404.9MB, aggrb=17431KB/s, minb=17431KB/s, maxb=17431KB/s, mint=200016msec, maxt=200016msec > > Disk stats (read/write): > sda: ios=0/2125, merge=0/2796, ticks=0/708, in_queue=708, util=0.23% > > -----Original Message----- > From: Alexandre DERUMIER [mailto:aderumier@xxxxxxxxx] > Sent: Tuesday, March 04, 2014 12:49 PM > To: Shu, Xinxin > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: [RFC] add rocksdb support > >>>Performance Test >>>Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. > > Thanks for your work, indeed performance seem to be promising ! > >>>Any comments or suggestions are greatly appreciated. > > Could you do test with random io write with last fio (with rbd support) ? > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-February/008182.html >> The fio command: fio -direct=1 -iodepth=64 -thread -rw=randwrite >>> -ioengine=rbd -bs=4k -size=19G -numjobs=1 -runtime=100 >>> -group_reporting -name=ebs_test -pool=openstack -rbdname=image >>> -clientname=fio -invalidate=0 > > > ----- Mail original ----- > > De: "Xinxin Shu" <xinxin.shu@xxxxxxxxx> > À: ceph-devel@xxxxxxxxxxxxxxx > Envoyé: Lundi 3 Mars 2014 03:07:18 > Objet: [RFC] add rocksdb support > > Hi all, > > This patch added rocksdb support for ceph, enabled rocksdb for omap directory. Rocksdb source code can be get from link. To use use rocksdb, C++11 standard should be enabled, gcc version >= 4.7 is required to get C++11 support. Rocksdb can be installed with instructions described in the INSTALL.md file, and rocksdb header files (include/rocksdb/*) and library (librocksdb.so*) need to be copied to corresponding directories. > To enable rocksdb, add "--with-librocksdb" option to configure. The rocksdb branch is here(https://github.com/xinxinsh/ceph/tree/rocksdb). > > > Performance Test > Attached file is the performance comparison of rocksdb and leveldb on four nodes with 40 osds, using 'rados bench' as the test tool. The performance results is quite promising. > > Any comments or suggestions are greatly appreciated. > > Rados bench BandWidth(MB/s) Average latency Leveldb rocksdb Leveldb rocksdb write 4 threads 263.762 272.549 0.061 0.059 write 8 threads 449.834 457.811 0.071 0.070 write 16 threads 642.100 638.972 0.100 0.100 write 32 threads 705.897 717.598 0.181 0.178 write 64 threads 705.011 717.204 0.370 0.362 read 4 threads 873.588 841.704 0.073 0.076 read 8 threads 816.699 818.451 0.078 0.078 read 16 threads 808.810 798.053 0.079 0.080 read 32 threads 798.394 802.796 0.080 0.080 read 64 threads 792.848 790.593 0.081 0.081 > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html