Re: RocksDB Incorrect API Usage

Haomai Wang <haomai@xxxxxxxx> · Wed, 1 Jun 2016 02:12:49 +0800

On Wed, Jun 1, 2016 at 2:08 AM, Jianjian Huo <jianjian.huo@xxxxxxxxxxx> wrote:
> Hi Haomai,
>
> I noticed this as well, and made same changes to RocksDBStore in this PR last week:
> https://github.com/ceph/ceph/pull/9215
>
> One thing which is even worse,  seek will bypass row cache, so kv pairs won't be able to be cached in row cache.
> I am working to benchmark the performance impact, will publish the results after I am done this week.

Oh, cool! I think you can cherry-pick my another leveldb fix.

BTW, do you pay an attention to prefix seek api? I think it will be
more suitable than column family in ceph case. If we can have
well-defined prefix rule, we can make most of range query cheaper!

>
> Jianjian
>
> On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xxxxxxxx> wrote:
>> Hi Sage and Mark,
>>
>> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
>> won't use bloom filter like *Get*.
>>
>> *Get* impl: it will look at filter firstly
>> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>>
>> Iterator *Seek*: it will do binary search, by default we don't specify
>> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
>> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>>
>> So I use a simple tests:
>>
>> ./db_bench -num 10000000  -benchmarks fillbatch
>> fill the db firstly with 1000w records.
>>
>> ./db_bench -use_existing_db  -benchmarks readrandomfast
>> readrandomfast case will use *Get* API to retrive data
>>
>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>> readrandomfast
>>
>> LevelDB:    version 4.3
>> Date:       Wed Jun  1 00:29:16 2016
>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>> CPUCache:   20480 KB
>> Keys:       16 bytes each
>> Values:     100 bytes each (50 bytes after compression)
>> Entries:    1000000
>> Prefix:    0 bytes
>> Keys per prefix:    0
>> RawSize:    110.6 MB (estimated)
>> FileSize:   62.9 MB (estimated)
>> Writes per second: 0
>> Compression: Snappy
>> Memtablerep: skip_list
>> Perf Level: 0
>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>> ------------------------------------------------
>> DB path: [/tmp/rocksdbtest-0/dbbench]
>> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
>> 1000100 found, issued 46639 non-exist keys)
>>
>> ===========================
>> then I modify readrandomfast to use Iterator API[0]:
>>
>> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
>> readrandomfast
>> LevelDB:    version 4.3
>> Date:       Wed Jun  1 00:33:03 2016
>> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
>> CPUCache:   20480 KB
>> Keys:       16 bytes each
>> Values:     100 bytes each (50 bytes after compression)
>> Entries:    1000000
>> Prefix:    0 bytes
>> Keys per prefix:    0
>> RawSize:    110.6 MB (estimated)
>> FileSize:   62.9 MB (estimated)
>> Writes per second: 0
>> Compression: Snappy
>> Memtablerep: skip_list
>> Perf Level: 0
>> WARNING: Assertions are enabled; benchmarks unnecessarily slow
>> ------------------------------------------------
>> DB path: [/tmp/rocksdbtest-0/dbbench]
>> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
>> 1000100 found, issued 46639 non-exist keys)
>>
>>
>> 45.18 us/op vs 4.57us/op!
>>
>> The test can be repeated and easy to do! Plz correct if I'm doing
>> foolish thing I'm not aware..
>>
>> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>>
>> We still can make further improvements by scanning all iterate usage
>> to make it better!
>>
>> [0]:
>> --- a/db/db_bench.cc
>> +++ b/db/db_bench.cc
>> @@ -2923,14 +2923,12 @@ class Benchmark {
>>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>>          ++read;
>> -        auto status = db->Get(options, key, &value);
>> -        if (status.ok()) {
>> -          ++found;
>> -        } else if (!status.IsNotFound()) {
>> -          fprintf(stderr, "Get returned an error: %s\n",
>> -                  status.ToString().c_str());
>> -          abort();
>> -        }
>> +        Iterator* iter = db->NewIterator(options);
>> +      iter->Seek(key);
>> +      if (iter->Valid() && iter->key().compare(key) == 0) {
>> +        found++;
>> +      }
>> +
>>          if (key_rand >= FLAGS_num) {
>>            ++nonexist;
>>          }
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html