RE: RocksDB Incorrect API Usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Haomai,

I noticed this as well, and made same changes to RocksDBStore in this PR last week:
https://github.com/ceph/ceph/pull/9215

One thing which is even worse,  seek will bypass row cache, so kv pairs won't be able to be cached in row cache.
I am working to benchmark the performance impact, will publish the results after I am done this week.

Jianjian

On Tue, May 31, 2016 at 9:49 AM, Haomai Wang <haomai@xxxxxxxx> wrote:
> Hi Sage and Mark,
>
> As mentioned in BlueStore standup, I found rocksdb iterator *Seek*
> won't use bloom filter like *Get*.
>
> *Get* impl: it will look at filter firstly
> https://github.com/facebook/rocksdb/blob/master/table/block_based_table_reader.cc#L1369
>
> Iterator *Seek*: it will do binary search, by default we don't specify
> prefix feature(https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes).
> https://github.com/facebook/rocksdb/blob/master/table/block.cc#L94
>
> So I use a simple tests:
>
> ./db_bench -num 10000000  -benchmarks fillbatch
> fill the db firstly with 1000w records.
>
> ./db_bench -use_existing_db  -benchmarks readrandomfast
> readrandomfast case will use *Get* API to retrive data
>
> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
> readrandomfast
>
> LevelDB:    version 4.3
> Date:       Wed Jun  1 00:29:16 2016
> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> CPUCache:   20480 KB
> Keys:       16 bytes each
> Values:     100 bytes each (50 bytes after compression)
> Entries:    1000000
> Prefix:    0 bytes
> Keys per prefix:    0
> RawSize:    110.6 MB (estimated)
> FileSize:   62.9 MB (estimated)
> Writes per second: 0
> Compression: Snappy
> Memtablerep: skip_list
> Perf Level: 0
> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> ------------------------------------------------
> DB path: [/tmp/rocksdbtest-0/dbbench]
> readrandomfast :       4.570 micros/op 218806 ops/sec; (1000100 of
> 1000100 found, issued 46639 non-exist keys)
>
> ===========================
> then I modify readrandomfast to use Iterator API[0]:
>
> [root@hunter-node2 rocksdb]# ./db_bench -use_existing_db  -benchmarks
> readrandomfast
> LevelDB:    version 4.3
> Date:       Wed Jun  1 00:33:03 2016
> CPU:        32 * Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
> CPUCache:   20480 KB
> Keys:       16 bytes each
> Values:     100 bytes each (50 bytes after compression)
> Entries:    1000000
> Prefix:    0 bytes
> Keys per prefix:    0
> RawSize:    110.6 MB (estimated)
> FileSize:   62.9 MB (estimated)
> Writes per second: 0
> Compression: Snappy
> Memtablerep: skip_list
> Perf Level: 0
> WARNING: Assertions are enabled; benchmarks unnecessarily slow
> ------------------------------------------------
> DB path: [/tmp/rocksdbtest-0/dbbench]
> readrandomfast :      45.188 micros/op 22129 ops/sec; (1000100 of
> 1000100 found, issued 46639 non-exist keys)
>
>
> 45.18 us/op vs 4.57us/op!
>
> The test can be repeated and easy to do! Plz correct if I'm doing
> foolish thing I'm not aware..
>
> So I proposal this PR: https://github.com/ceph/ceph/pull/9411
>
> We still can make further improvements by scanning all iterate usage
> to make it better!
>
> [0]:
> --- a/db/db_bench.cc
> +++ b/db/db_bench.cc
> @@ -2923,14 +2923,12 @@ class Benchmark {
>          int64_t key_rand = thread->rand.Next() & (pot - 1);
>          GenerateKeyFromInt(key_rand, FLAGS_num, &key);
>          ++read;
> -        auto status = db->Get(options, key, &value);
> -        if (status.ok()) {
> -          ++found;
> -        } else if (!status.IsNotFound()) {
> -          fprintf(stderr, "Get returned an error: %s\n",
> -                  status.ToString().c_str());
> -          abort();
> -        }
> +        Iterator* iter = db->NewIterator(options);
> +      iter->Seek(key);
> +      if (iter->Valid() && iter->key().compare(key) == 0) {
> +        found++;
> +      }
> +
>          if (key_rand >= FLAGS_num) {
>            ++nonexist;
>          }
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux