Re: mon switch from leveldb to rocksdb

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 3 May 2016 09:41:01 -0700

On Tue, May 3, 2016 at 6:34 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> On 05/02/2016 02:00 PM, Howard Chu wrote:
>>
>> Sage Weil wrote:
>>>
>>> 1) Thoughts on moving to rocksdb in general?
>>
>>
>> Are you actually prepared to undertake all of the measurement and tuning
>> required to make RocksDB actually work well? You're switching from an
>> (abandoned/unsupported) engine with only a handful of config parameters
>> to one with ~40-50 params, all of which have critical but unpredictable
>> impact on resource consumption and performance.
>>
>
> You are absolutely correct, and there are definitely pitfalls we need to
> watch out for with the number of tunables in rocksdb.  At least on the
> performance side two of the big issues we've hit with leveldb compaction
> related.  In some scenarios compaction happens slower than the number of
> writes coming in resulting in ever-growing db sizes.  The other issue is
> that compaction is single threaded and this can cause stalls and general
> mayhem when things get really heavily loaded.  My hope is that if we do go
> with rocksdb, even in a sub-optimally tuned state, we'll be better off than
> we were with leveldb.
>
> We did some very preliminary benchmarks a couple of years ago (admittedly a
> too-small dataset size) basically comparing the (at the time) stock ceph
> leveldb settings vs rocksdb.  On this set size, leveldb looked much better
> for reads, but much worse for writes.

That's actually a bit troubling — many of our monitor problems have
arisen from slow reads, rather than slow writes. I suspect we want to
eliminate this before switching, if it's a concern.

...Although I think I did see a monitor caching layer go by, so maybe
it's a moot point now?
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html