Hi Haomai, I noticed this as well, and made same changes to RocksDBStore in this PR last week: https://github.com/ceph/ceph/pull/9215 One thing which is even worse, seek will bypass row cache, so kv pairs won't be able to be cached in row cache. I am working to benchmark the performance impact, will publish the results after I am done this week. Jianjian On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xxxxxxxx> wrote: > Hi Sage and Mark, > > As mentioned in BlueStore standup, I found rocksdb iterator *Seek* > won't use bloom filter like *Get*. > > *Get* impl: it will look at filter firstly > https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369 > > Iterator *Seek*: it will do binary search, by default we don't specify > prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes). > https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94 > > So I use a simple tests: > > ./db_bench -num 10000000 -benchmarks fillbatch > fill the db firstly with 1000w records. > > ./db_bench -use_existing_db -benchmarks readrandomfast > readrandomfast case will use *Get* API to retrive data > > [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db -benchmarks > readrandomfast > > LevelDB: version 4.3 > Date: Wed Jun 1 00:29:16 2016 > CPU: 32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz > CPUCache: 20480 KB > Keys: 16 bytes each > Values: 100 bytes each (50 bytes after compression) > Entries: 1000000 > Prefix: 0 bytes > Keys per prefix: 0 > RawSize: 110.6 MB (estimated) > FileSize: 62.9 MB (estimated) > Writes per second: 0 > Compression: Snappy > Memtablerep: skip_list > Perf Level: 0 > WARNING: Assertions are enabled; benchmarks unnecessarily slow > ------------------------------------------------ > DB path: [/tmp/rocksdbtest-0/dbbench] > readrandomfast : 4.570 micros/op 218806 ops/sec; (1000100 of > 1000100 found, issued 46639 non-exist keys) > > =========================== > then I modify readrandomfast to use Iterator API[0]: > > [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db -benchmarks > readrandomfast > LevelDB: version 4.3 > Date: Wed Jun 1 00:33:03 2016 > CPU: 32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz > CPUCache: 20480 KB > Keys: 16 bytes each > Values: 100 bytes each (50 bytes after compression) > Entries: 1000000 > Prefix: 0 bytes > Keys per prefix: 0 > RawSize: 110.6 MB (estimated) > FileSize: 62.9 MB (estimated) > Writes per second: 0 > Compression: Snappy > Memtablerep: skip_list > Perf Level: 0 > WARNING: Assertions are enabled; benchmarks unnecessarily slow > ------------------------------------------------ > DB path: [/tmp/rocksdbtest-0/dbbench] > readrandomfast : 45.188 micros/op 22129 ops/sec; (1000100 of > 1000100 found, issued 46639 non-exist keys) > > > 45.18 us/op vs 4.57us/op! > > The test can be repeated and easy to do! Plz correct if I'm doing > foolish thing I'm not aware.. > > So I proposal this PR: https://github.com/ceph/ceph/pull/9411 > > We still can make further improvements by scanning all iterate usage > to make it better! > > [0]: > --- a/db/db_bench.cc > +++ b/db/db_bench.cc > @@ -2923,14 +2923,12 @@ class Benchmark { > int64_t key_rand = thread->rand.Next() & (pot - 1); > GenerateKeyFromInt(key_rand, FLAGS_num, &key); > ++read; > - auto status = db->Get(options, key, &value); > - if (status.ok()) { > - ++found; > - } else if (!status.IsNotFound()) { > - fprintf(stderr, "Get returned an error: %s\n", > - status.ToString().c_str()); > - abort(); > - } > + Iterator* iter = db->NewIterator(options); > + iter->Seek(key); > + if (iter->Valid() && iter->key().compare(key) == 0) { > + found++; > + } > + > if (key_rand >= FLAGS_num) { > ++nonexist; > } > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f